Fwd: Troubles getting Gadget2 running on a cluster

From: Gregory Poole <gpoole_at_astro.swin.edu.au>
Date: Mon, 18 Feb 2008 18:01:21 +1100

Greetings everyone,

I'm having troubles getting Gadget2 to run stably on our cluster here
at Swinburne. It runs for a few time steps and then crashes (at
seemingly random times) with the following cryptic error message:

p12_12162: (7355.398438) net_recv failed for fd = 80
p12_12162: p4_error: net_recv read, errno = : 110
rm_l_12_12722: (7355.398438) net_send: could not write to fd=5, errno
= 32

Our system consists of dual quad-core AMD machines with Gigabit
interconnect running on Cent OS 5.

I got word from a friend that it may be a stack problem and I tried
calling the following routine after MPI_Init:

void setstacklim__(void)
{
    struct rlimit old_Limit;
    struct rlimit new_Limit;
    int old_Limit_grval;
    int new_Limit_srval;
    int new_Limit_grval;

    old_Limit_grval =getrlimit(RLIMIT_STACK,&old_Limit);
    new_Limit.rlim_cur =RLIM_INFINITY;
    new_Limit.rlim_max =RLIM_INFINITY;
    new_Limit_srval =setrlimit(RLIMIT_STACK,&new_Limit);
    new_Limit_grval =getrlimit(RLIMIT_STACK,&new_Limit);
    printf("\n rvals=(%d,%d,%d) Limits were=(%d,%d) and now are (%d,%
d); RLIMIT_INFINITY=%d\n",
           old_Limit_grval,new_Limit_srval,new_Limit_grval,
           old_Limit.rlim_cur,old_Limit.rlim_max,
           new_Limit.rlim_cur,new_Limit.rlim_max,RLIM_INFINITY);
}

The output from this routine is:

  rvals=(0,0,0) Limits were=(10485760,-1) and now are (-1,-1);
RLIMIT_INFINITY=-1

This did not fix the problem.

Has anyone encountered problems like this on a system such as ours?
Any suggestions as to what the solution might be?

Thanks for your time and attention,

...Greg Poole
Received on 2008-02-18 08:01:48

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:30 CET