Re: Fwd: Troubles getting Gadget2 running on a cluster

From: Matthew Francis <mfrancis_at_physics.usyd.edu.au>
Date: Tue, 19 Feb 2008 13:53:26 +1100

Hi Greg,

I've been running Gadget2 on the Swinburne cluster for several months
with no trouble. I haven't done anything fancy or anything along the
lines suggested by Volker, I pretty much ported to setup I had on the
USyd cluster to Swinburne and it's been fine. I'd be happy to show you
the Makefile I've been using if it would help.

Cheers

Matt Francis

On Mon, 2008-02-18 at 18:01 +1100, Gregory Poole wrote:
>
> Greetings everyone,
>
> I'm having troubles getting Gadget2 to run stably on our cluster here
> at Swinburne. It runs for a few time steps and then crashes (at
> seemingly random times) with the following cryptic error message:
>
> p12_12162: (7355.398438) net_recv failed for fd = 80
> p12_12162: p4_error: net_recv read, errno = : 110
> rm_l_12_12722: (7355.398438) net_send: could not write to fd=5, errno
> = 32
>
> Our system consists of dual quad-core AMD machines with Gigabit
> interconnect running on Cent OS 5.
>
> I got word from a friend that it may be a stack problem and I tried
> calling the following routine after MPI_Init:
>
> void setstacklim__(void)
> {
> struct rlimit old_Limit;
> struct rlimit new_Limit;
> int old_Limit_grval;
> int new_Limit_srval;
> int new_Limit_grval;
>
> old_Limit_grval =getrlimit(RLIMIT_STACK,&old_Limit);
> new_Limit.rlim_cur =RLIM_INFINITY;
> new_Limit.rlim_max =RLIM_INFINITY;
> new_Limit_srval =setrlimit(RLIMIT_STACK,&new_Limit);
> new_Limit_grval =getrlimit(RLIMIT_STACK,&new_Limit);
> printf("\n rvals=(%d,%d,%d) Limits were=(%d,%d) and now are (%d,%
> d); RLIMIT_INFINITY=%d\n",
> old_Limit_grval,new_Limit_srval,new_Limit_grval,
> old_Limit.rlim_cur,old_Limit.rlim_max,
> new_Limit.rlim_cur,new_Limit.rlim_max,RLIM_INFINITY);
> }
>
> The output from this routine is:
>
> rvals=(0,0,0) Limits were=(10485760,-1) and now are (-1,-1);
> RLIMIT_INFINITY=-1
>
> This did not fix the problem.
>
> Has anyone encountered problems like this on a system such as ours?
> Any suggestions as to what the solution might be?
>
> Thanks for your time and attention,
>
> ..Greg Poole
>
>
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
Received on 2008-02-19 03:53:32

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:30 CET