Re: Gadget2 fatal error

From: Volker Springel <volker_at_MPA-Garching.MPG.DE>
Date: Wed, 17 Oct 2007 13:28:44 +0200

Hi,

This looks like a memory allocation problem.

In domain.c, there is only one call of MPI_Allgatherv(), line 953.
A couple of lines above, the buffer toplist is allocated with

  toplist = malloc(ntop * sizeof(struct topnode_exchange));

but sloppily, no check is performed whether this memory allocation
actually succeeded... (if it didn't this would explain your crash and
the error messages)

You could check for this issue by changing the allocation statement to

  if(!(toplist = malloc(ntop * sizeof(struct topnode_exchange))))
    endrun(11231);

In improved versions of gadget, all calls of malloc() are properly
checked for memory allocation problems. One simple way for doing this is
to encapsulate malloc() in a driver function mymalloc() that is always
called instead of calling malloc() directly. You might consider such a
change in order to track down this and similar memory limit issues more
reliably.

Volker




Cameron McBride wrote:
> Greetings,
>
> We ran into the following error while trying to benchmark and test
> Gadget2 for some DM only simulations on a Cray XT3 platform.
>
> --
> domain decomposition...
> aborting job:
> Fatal error in MPI_Allgatherv: Invalid buffer pointer, error stack:
> MPI_Allgatherv(1022): MPI_Allgatherv(sbuf=0xf8ab6b0, scount=1931904,
> MPI_BYTE, rbuf=(nil), rcounts=0xfac6750, displs=0xfac7760, MPI_BYTE,
> MPI_COMM_WORLD) failed
> MPI_Allgatherv(966): Null buffer pointer
> --
>
> We are using PMGRID=2048 on 1024 PE, and the simulation consists of
> 1250^3 particles. With the same Gadget2 binary on the same number of PE, we were able to run smaller simulations (640^3 and 960^3).
>
> Initially, we hit a very similar error on any attempt to use 1024 PE.
> We were able to get around this by changing the TOPNODEFACTOR to '2.0'
> from the original value of '20.0' in the domain.c file and we got the
> 960^3 particle simulation to run. Changing this back to '20.0' did not
> fix the issue for the 1250^3 run.
>
> The version of Gadget2 we're using is pretty much a stock 2.0.3 version.
>
> Any ideas? Thanks!
>
>
> Cameron
>
>
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
Received on 2007-10-17 13:28:44

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:30 CET