Re: endrun(888) problem

From: Volker Springel <volker_at_MPA-Garching.MPG.DE>
Date: Thu, 8 May 2014 23:01:11 +0200

Your initial conditions are probably incorrect. Check that your coordinates are really distributed throughout [0,100].
Volker

On May 8, 2014, at 2:46 PM, arlindo.trindade_at_gmail.com wrote:

> Dear Volker Springel,
>
> I've increased the number of particles to 2* 256^3 and the box to 100 Mpc/h. I've increased PartAllocFactor substantially ( from as low as 1.6 to as high as 50!) and the code starts to perform the domain decomposition but after a while still get the same error at line 371 in the forcetree.c file.
>
>
> For example in the output of a run I show below, I used the following memory parameters:
>
> PartAllocFactor 15
> TreeAllocFactor 0.8
> BufferSize 30
>
>
> However the error persists if I use
>
> PartAllocFactor 20
> TreeAllocFactor 1.2
> BufferSize 30
>
> our even higher values.
>
>
> I also change the values both of TOPNODEFACTOR and MAXTOPNODES, but I can't get the problem solved.
>
> Arlindo
>
> *****************************************************************************************
>
>
> This is Gadget, version `2.0'.
>
> Running on 6 processors.
>
> found 75 times in output-list.
>
> Allocated 30 MByte communication buffer per processor.
>
> Communication buffer has room for 714938 particles in gravity computation
> Communication buffer has room for 245760 particles in density computation
> Communication buffer has room for 196608 particles in hydro computation
> Communication buffer has room for 182890 particles in domain decomposition
>
>
> Hubble (internal units) = 100
> G (internal units) = 43.0071
> UnitMass_in_g = 1.989e+43
> UnitTime_in_s = 3.08568e+19
> UnitVelocity_in_cm_per_s = 100000
> UnitDensity_in_cgs = 6.76991e-31
> UnitEnergy_in_cgs = 1.989e+53
>
> Task=0 FFT-Slabs=86
> Task=1 FFT-Slabs=86
> Task=2 FFT-Slabs=86
> Task=3 FFT-Slabs=86
> Task=4 FFT-Slabs=86
> Task=5 FFT-Slabs=82
>
> Allocated 6400 MByte for particle storage. 80
>
> Allocated 3360 MByte for storage of SPH data. 84
>
>
> reading file `/mnt/lustre/atrindade/2LPT/IC08_mpich2/ics.10' on task=0 (contains 2097152 particles.)
> distributing this file to tasks 0-0
> Type 0 (gas): 1048576 (tot= 0016777216) masstab=0.0661731
> Type 1 (halo): 1048576 (tot= 0016777216) masstab=0.430125
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
>
> reading file `/mnt/lustre/atrindade/2LPT/IC08_mpich2/ics.11' on task=1 (contains 2097152 particles.)
> distributing this file to tasks 1-1
> Type 0 (gas): 1048576 (tot= 0016777216) masstab=0.0661731
> Type 1 (halo): 1048576 (tot= 0016777216) masstab=0.430125
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
>
> reading file `/mnt/lustre/atrindade/2LPT/IC08_mpich2/ics.12' on task=2 (contains 2097152 particles.)
> distributing this file to tasks 2-2
> Type 0 (gas): 1048576 (tot= 0016777216) masstab=0.0661731
> Type 1 (halo): 1048576 (tot= 0016777216) masstab=0.430125
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
>
> reading file `/mnt/lustre/atrindade/2LPT/IC08_mpich2/ics.13' on task=3 (contains 2097152 particles.)
> distributing this file to tasks 3-3
> Type 0 (gas): 1048576 (tot= 0016777216) masstab=0.0661731
> Type 1 (halo): 1048576 (tot= 0016777216) masstab=0.430125
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
>
> reading file `/mnt/lustre/atrindade/2LPT/IC08_mpich2/ics.14' on task=4 (contains 2097152 particles.)
> distributing this file to tasks 4-4
> Type 0 (gas): 1048576 (tot= 0016777216) masstab=0.0661731
> Type 1 (halo): 1048576 (tot= 0016777216) masstab=0.430125
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
>
> reading file `/mnt/lustre/atrindade/2LPT/IC08_mpich2/ics.15' on task=5 (contains 2097152 particles.)
> distributing this file to tasks 5-5
> Type 0 (gas): 1048576 (tot= 0016777216) masstab=0.0661731
> Type 1 (halo): 1048576 (tot= 0016777216) masstab=0.430125
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
>
> reading file `/mnt/lustre/atrindade/2LPT/IC08_mpich2/ics.4' on task=0 (contains 2097152 particles.)
> distributing this file to tasks 0-0
> Type 0 (gas): 1048576 (tot= 0016777216) masstab=0.0661731
> Type 1 (halo): 1048576 (tot= 0016777216) masstab=0.430125
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
>
> reading file `/mnt/lustre/atrindade/2LPT/IC08_mpich2/ics.5' on task=1 (contains 2097152 particles.)
> distributing this file to tasks 1-1
> Type 0 (gas): 1048576 (tot= 0016777216) masstab=0.0661731
> Type 1 (halo): 1048576 (tot= 0016777216) masstab=0.430125
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
>
> reading file `/mnt/lustre/atrindade/2LPT/IC08_mpich2/ics.6' on task=2 (contains 2097152 particles.)
> distributing this file to tasks 2-2
> Type 0 (gas): 1048576 (tot= 0016777216) masstab=0.0661731
> Type 1 (halo): 1048576 (tot= 0016777216) masstab=0.430125
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
>
> reading file `/mnt/lustre/atrindade/2LPT/IC08_mpich2/ics.7' on task=3 (contains 2097152 particles.)
> distributing this file to tasks 3-3
> Type 0 (gas): 1048576 (tot= 0016777216) masstab=0.0661731
> Type 1 (halo): 1048576 (tot= 0016777216) masstab=0.430125
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
>
> reading file `/mnt/lustre/atrindade/2LPT/IC08_mpich2/ics.8' on task=4 (contains 2097152 particles.)
> distributing this file to tasks 4-4
> Type 0 (gas): 1048576 (tot= 0016777216) masstab=0.0661731
> Type 1 (halo): 1048576 (tot= 0016777216) masstab=0.430125
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
>
> reading file `/mnt/lustre/atrindade/2LPT/IC08_mpich2/ics.9' on task=5 (contains 2097152 particles.)
> distributing this file to tasks 5-5
> Type 0 (gas): 1048576 (tot= 0016777216) masstab=0.0661731
> Type 1 (halo): 1048576 (tot= 0016777216) masstab=0.430125
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
>
> reading file `/mnt/lustre/atrindade/2LPT/IC08_mpich2/ics.0' on task=0 (contains 2097152 particles.)
> distributing this file to tasks 0-0
> Type 0 (gas): 1048576 (tot= 0016777216) masstab=0.0661731
> Type 1 (halo): 1048576 (tot= 0016777216) masstab=0.430125
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
>
> reading file `/mnt/lustre/atrindade/2LPT/IC08_mpich2/ics.1' on task=1 (contains 2097152 particles.)
> distributing this file to tasks 1-2
> Type 0 (gas): 1048576 (tot= 0016777216) masstab=0.0661731
> Type 1 (halo): 1048576 (tot= 0016777216) masstab=0.430125
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
>
> reading file `/mnt/lustre/atrindade/2LPT/IC08_mpich2/ics.2' on task=3 (contains 2097152 particles.)
> distributing this file to tasks 3-3
> Type 0 (gas): 1048576 (tot= 0016777216) masstab=0.0661731
> Type 1 (halo): 1048576 (tot= 0016777216) masstab=0.430125
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
>
> reading file `/mnt/lustre/atrindade/2LPT/IC08_mpich2/ics.3' on task=4 (contains 2097152 particles.)
> distributing this file to tasks 4-5
> Type 0 (gas): 1048576 (tot= 0016777216) masstab=0.0661731
> Type 1 (halo): 1048576 (tot= 0016777216) masstab=0.430125
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
> reading done.
> Total number of particles : 0033554432
>
> allocated 0.0762939 Mbyte for ngb search.
>
> Allocated 4812.29 MByte for BH-tree. 64
>
> domain decomposition...
> NTopleaves= 127
> work-load balance=6 memory-balance=6
> exchange of 0027262976 particles
> exchange of 0025434076 particles
> exchange of 0023605176 particles
> exchange of 0021776276 particles
> exchange of 0019947376 particles
> exchange of 0018118476 particles
> exchange of 0016289576 particles
> exchange of 0014460676 particles
> exchange of 0012631776 particles
> exchange of 0010802876 particles
> exchange of 0008973976 particles
> exchange of 0007145076 particles
> exchange of 0005316176 particles
> exchange of 0003487276 particles
> exchange of 0001658376 particles
> task 2: endrun called with an error level of 8882
>
>
> task 1: endrun called with an error level of 8882
>
>
> application called MPI_Abort(MPI_COMM_WORLD, 8882) - process 2
> application called MPI_Abort(MPI_COMM_WORLD, 8882) - process 1
>
>
>
>
>
> 2014-05-07 20:56 GMT+01:00 Volker Springel <volker_at_mpa-garching.mpg.de>:
>
>
> I think Ali's hint is spot on. The simulation is likely so small that the ratio between the number of tree nodes required by the code and the particle number is much larger than normally needed for well-loaded processors. It should be possible to cure this with a (drastic) increase of the parameter 'TreeAllocFactor' in the parameter file.
>
> Volker
>
> On May 7, 2014, at 5:46 PM, Ali Snedden wrote:
>
> > You might read the user manual and check that you are using reasonable values in your parameter file (specifically All.PartAllocFactor). Perhaps All.MaxPart is too small or you're not using a large enough value for MaxNodes. Good luck.
> >
> >
> > ~ali
> >
> >
> > On Wed, May 7, 2014 at 11:31 AM, arlindo.trindade_at_gmail.com <arlindo.trindade_at_gmail.com> wrote:
> > Hi Ali,
> >
> > Thanks for your email.
> >
> > I've changed the values that are passed to each endrun as you suggested and the code breaks at line 371 in the forcetree.c file. But still I can't find out what causes this error.
> >
> > Do you have any ideas?
> >
> > Best regards,
> > Arlindo
> >
> >
> >
> > 2014-05-07 16:03 GMT+01:00 Ali Snedden <asnedden_at_nd.edu>:
> >
> > Hello Arlindo,
> >
> > It would be nice to know which line it actually breaks at. You could use a debugger like Gdb or just change the values passed to each line that has endrun(888).
> >
> > It is worth your time to learn how to use a debugger. >From my personal experience, using a debugger has probably cut my development time between 30-50% compared with just using print statements.
> >
> > To use Gdb for a parallel program. First compile Gadget with the '-g' compiler option. Then enter the following into your command line.
> >
> > mpirun -np 2 xterm -geometry 100x62 -sb -sl 10000 -e gdb ./Gadget2
> >
> > I added a bunch of extra options to customize the x11 window. Then in each of your two x11 windows enter the command line arguments (i.e. pass your parameter file).
> >
> > start lcdm.param
> >
> > Then you can set breakpoints and do other neat things by following these helpful websites.
> >
> > http://betterexplained.com/articles/debugging-with-gdb/
> > http://www.unknownroad.com/rtfm/gdbtut/gdbtoc.html
> >
> >
> > Best Regards
> > ~Ali
> >
> >
> > On Wed, May 7, 2014 at 10:42 AM, Arlindo Trindade _at_ gmail <arlindo.trindade_at_gmail.com> wrote:
> >
> > Hi all,
> >
> > I'm trying to running some N-body simulation tests with Gadget 2 on a cluster. The simulation is very very small, the number of particles is N=2*16^3 ( dark matter + gas) and the boxsize is L=3.125 Mpc. However I get a endrun 888 error. The same thing happens if I increase the size of the simulation ( both L and N). I've identified the files where the function endrun is called with the code 888 ( timestep.c lines 482 and 530, forcetree.c line 371) but I still can't understand what the problem is and thus I can't solve this problem.
> >
> > Does anyone has a suggestion?
> >
> > Cheers,
> > Arlindo
> >
> >
> >
> >
> > -----------------------------------------------------------
> > If you wish to unsubscribe from this mailing, send mail to
> > minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> > A web-archive of this mailing list is available here:
> > http://www.mpa-garching.mpg.de/gadget/gadget-list
> >
> >
> >
> > -----------------------------------------------------------
> >
> > If you wish to unsubscribe from this mailing, send mail to
> > minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> > A web-archive of this mailing list is available here:
> > http://www.mpa-garching.mpg.de/gadget/gadget-list
> >
> >
> >
> >
> > --
> > Arlindo Trindade
> >
> >
> > -----------------------------------------------------------
> >
> > If you wish to unsubscribe from this mailing, send mail to
> > minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> > A web-archive of this mailing list is available here:
> > http://www.mpa-garching.mpg.de/gadget/gadget-list
> >
> >
> >
> > -----------------------------------------------------------
> >
> > If you wish to unsubscribe from this mailing, send mail to
> > minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> > A web-archive of this mailing list is available here:
> > http://www.mpa-garching.mpg.de/gadget/gadget-list
>
>
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
>
>
>
> --
> Arlindo Trindade
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
Received on 2014-05-08 23:01:15

This archive was generated by hypermail 2.3.0 : 2022-09-01 14:03:42 CEST