Fwd: Troubles getting Gadget2 running on a cluster
Greetings everyone,
I'm having troubles getting Gadget2 to run stably on our cluster here
at Swinburne. It runs for a few time steps and then crashes (at
seemingly random times) with the following cryptic error message:
p12_12162: (7355.398438) net_recv failed for fd = 80
p12_12162: p4_error: net_recv read, errno = : 110
rm_l_12_12722: (7355.398438) net_send: could not write to fd=5, errno
= 32
Our system consists of dual quad-core AMD machines with Gigabit
interconnect running on Cent OS 5.
I got word from a friend that it may be a stack problem and I tried
calling the following routine after MPI_Init:
void setstacklim__(void)
{
struct rlimit old_Limit;
struct rlimit new_Limit;
int old_Limit_grval;
int new_Limit_srval;
int new_Limit_grval;
old_Limit_grval =getrlimit(RLIMIT_STACK,&old_Limit);
new_Limit.rlim_cur =RLIM_INFINITY;
new_Limit.rlim_max =RLIM_INFINITY;
new_Limit_srval =setrlimit(RLIMIT_STACK,&new_Limit);
new_Limit_grval =getrlimit(RLIMIT_STACK,&new_Limit);
printf("\n rvals=(%d,%d,%d) Limits were=(%d,%d) and now are (%d,%
d); RLIMIT_INFINITY=%d\n",
old_Limit_grval,new_Limit_srval,new_Limit_grval,
old_Limit.rlim_cur,old_Limit.rlim_max,
new_Limit.rlim_cur,new_Limit.rlim_max,RLIM_INFINITY);
}
The output from this routine is:
rvals=(0,0,0) Limits were=(10485760,-1) and now are (-1,-1);
RLIMIT_INFINITY=-1
This did not fix the problem.
Has anyone encountered problems like this on a system such as ours?
Any suggestions as to what the solution might be?
Thanks for your time and attention,
...Greg Poole
Received on 2008-02-18 08:01:48
This archive was generated by hypermail 2.3.0
: 2023-01-10 10:01:30 CET