Re: On Gadget-4 Foreign Nodes

From: Volker Springel <vspringel_at_MPA-Garching.MPG.DE>
Date: Sat, 19 Dec 2020 17:47:39 +0100

Hi Tiago,

The code has experienced a memory problem in building the locally essential tree, because you need more space for the imported tree nodes (these are quite bulky because you 5-th order and double precision throughout).

The best would be that you increase MaxMemSize in your parameterfile... because you are currently only using slightly less than half of the memory that is available on your compute nodes, there is in principle plenty of room for this.

However, from your output for "avail /dev/shm:" I can see that you are suffering from a familiar, poor configuration of your compute nodes... Your shared memory is unnecessarily restricted to 50% of the physical memory.

This 50% restriction is an unfortunate default setting for the maximum shared memory adopted in many Linux distributions. With this, GADGET4 can only use half of the available memory...

But there is really no deeper reason for this limit, and one can easily change the maximum size of /dev/shm to, for example, 95% of the physical memory.

Look for example at this paper https://dl.acm.org/doi/10.1145/3176364.3176367 describing the issue, particularly the end of section 3.2, where a plea is made to system administrators to correct this setting. In fact, the stability of the compute nodes is completely unaffected and just fine if the limit is raied to, e.g., 95% of the available memory. This can be done with a simple

mount –o remount,size=95% /dev/shm

command on the fly, but this is of course only possible for administrators. And in any case, the setting needs to be made permanent to stay in place after the next reboot.

I note that this has change been implemented on all the supercomputers in Garching (both at the LRZ and MPCDF), causing no problems at all. I would therefore recommend that you ask your system administrator to do the same on your machine, too.

Then you can increase MaxMemSize and the problem should be solvable that way.

Otherwise, you can in principle try to increase the 0.33 number in the following line

   int nspace = (0.33 * Mem.FreeBytes) / (sizeof(gravnode) + 8 * sizeof(foreign_gravpoint_data));

in fmm.cc, for example to 0.5, or even a bit larger. This might work in your particular case, but only if you're lucky.

Best,
Volker



> On 16. Dec 2020, at 18:46, Tiago Castro <tiagobscastro_at_gmail.com> wrote:
>
> Dear list,
>
> I am trying to run a cosmological simulation (500.0 Mpc/h, 1024^3 particles), and the code returns me the error below. Looking at the source code, I could not understand exactly how MaxForeignNodes is decided and if there's something I can try to change on the parameters file. I am already using the entire local cluster.
>
> Many thanks!
> ---------------------------------
> Shared memory islands host a minimum of 37 and a maximum of 37 MPI ranks.
> We shall use 6 MPI ranks in total for assisting one-sided communication (1 per shared memory node).
>
> ___ __ ____ ___ ____ ____ __
> / __) /__\ ( _ \ / __)( ___)(_ _)___ /. |
> ( (_-. /(__)\ )(_) )( (_-. )__) )( (___)(_ _)
> \___/(__)(__)(____/ \___/(____) (__) (_)
>
> This is Gadget, version 4.0.
> Git commit unknown, unknown
>
> Code was compiled with the following compiler and flags:
> mpicxx -std=c++11 -ggdb -O3 -march=native -Wall -Wno-format-security -I/beegfs/tcastro/gadget4/include/ -I/beegfs/tcastro/gadge
> t4/include/gsl -I/beegfs/tcastro/gadget4/include/ -Ibuild -Isrc
>
>
> Code was compiled with the following settings:
> ASMTH=3.0
> CREATE_GRID
> DOUBLEPRECISION=1
> DOUBLEPRECISION_FFTW
> ENLARGE_DYNAMIC_RANGE_IN_TIME
> FMM
> FOF
> FOF_GROUP_MIN_LEN=100
> FOF_LINKLENGTH=0.2
> FOF_PRIMARY_LINK_TYPES=2
> HIERARCHICAL_GRAVITY
> IMPOSE_PINNING
> LEAN
> MERGERTREE
> MULTIPOLE_ORDER=5
> NGENIC=1024
> NGENIC_2LPT
> NSOFTCLASSES=1
> NTAB=256
> NTYPES=6
> OUTPUT_TIMESTEP
> PERIODIC
> PMGRID=1024
> POWERSPEC_ON_OUTPUT
> PRESERVE_SHMEM_BINARY_INVARIANCE
> RANDOMIZE_DOMAINCENTER
> RCUT=6.0
> SELFGRAVITY
> SUBFIND
> SUBFIND_HBT
> TREE_NUM_BEFORE_NODESPLIT=4
>
>
> Running on 216 MPI tasks.
>
>
> BEGRUN: Size of particle structure 128 [bytes]
> BEGRUN: Size of sph particle structure 216 [bytes]
> BEGRUN: Size of gravity tree node 352 [bytes]
> BEGRUN: Size of neighbour tree node 192 [bytes]
> BEGRUN: Size of subfind auxiliary data 64 [bytes]
>
> PINNING: We have 4 sockets, 40 physical cores and 40 logical cores on the first MPI-task's node.
> PINNING: Looks like 10 logical cores are available.
> PINNING: Looks like already before start of the code, a tight binding was imposed.
> PINNING: We refrain from any pinning attempt ourselves. (This can be changed by setting the compile flag IMPOSE_PINNING_OVERRIDE_MODE
> .)
>
> -------------------------------------------------------------------------------------------------------------------------
> AvailMem: Largest = 251624.55 Mb (on task= 144), Smallest = 251293.10 Mb (on task= 72), Average = 251463.74 Mb
> Total Mem: Largest = 257655.01 Mb (on task= 0), Smallest = 257655.01 Mb (on task= 0), Average = 257655.01 Mb
> Committed_AS: Largest = 6361.91 Mb (on task= 72), Smallest = 6030.45 Mb (on task= 144), Average = 6191.26 Mb
> SwapTotal: Largest = 4000.00 Mb (on task= 0), Smallest = 4000.00 Mb (on task= 0), Average = 4000.00 Mb
> SwapFree: Largest = 4000.00 Mb (on task= 0), Smallest = 3966.40 Mb (on task= 180), Average = 3992.73 Mb
> AllocMem: Largest = 6361.91 Mb (on task= 72), Smallest = 6030.45 Mb (on task= 144), Average = 6191.26 Mb
> avail /dev/shm: Largest = 128788.88 Mb (on task= 144), Smallest = 128785.64 Mb (on task= 0), Average = 128787.51 Mb
> -------------------------------------------------------------------------------------------------------------------------
> Task=0 has the maximum commited memory and is host: gen09-10
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Obtaining parameters from file 'param.1024p3.txt':
>
> InitCondFile ./ics
> OutputDir ./1024p3
> SnapshotFileBase snap
> OutputListFilename ./outputs.txt
> ICFormat 2
> SnapFormat 3
> TimeLimitCPU 172800
> CpuTimeBetRestartFile 7200
> MaxMemSize 3200
> TimeBegin 0.01
> TimeMax 1
> ComovingIntegrationOn 1
> Omega0 0.30711
> OmegaLambda 0.69289
> OmegaBaryon 0.04825
> HubbleParam 0.6777
> Hubble 0.1
> BoxSize 500000
> OutputListOn 1
> TimeBetSnapshot 0
> TimeOfFirstSnapshot 0
> TimeBetStatistics 0.01
> NumFilesPerSnapshot 16
> MaxFilesWithConcurrentIO 8
> ErrTolIntAccuracy 0.05
> CourantFac 0.15
> MaxSizeTimestep 0.05
> MinSizeTimestep 0
> TypeOfOpeningCriterion 1
> ErrTolTheta 0.4
> ErrTolThetaMax 1
> ErrTolForceAcc 0.005
> TopNodeFactor 3
> ActivePartFracForNewDomainDecomp 0.01
> DesNumNgb 64
> MaxNumNgbDeviation 1
> UnitLength_in_cm 3.08568e+21
> UnitMass_in_g 1.989e+43
> UnitVelocity_in_cm_per_s 100000
> GravityConstantInternal 0
> SofteningComovingClass0 12
> SofteningMaxPhysClass0 12
> SofteningClassOfPartType0 0
> SofteningClassOfPartType1 0
> SofteningClassOfPartType2 0
> SofteningClassOfPartType3 0
> SofteningClassOfPartType4 0
> SofteningClassOfPartType5 0
> DesLinkNgb 20
> ArtBulkViscConst 1
> MinEgySpec 0
> InitGasTemp 0
> NSample 1024
> GridSize 1024
> Seed 181170
> SphereMode 1
> PowerSpectrumType 2
> ReNormalizeInputSpectrum 1
> PrimordialIndex 1
> ShapeGamma 0.21
> Sigma8 0.8288
> PowerSpectrumFile powerspec
> InputSpectrum_UnitLength_in_cm 3.08568e+24
>
> MALLOC: Allocation of shared memory took 0.00582997 sec
>
> found 5 times in output-list.
> BEGRUN: Hubble (internal units) = 0.1
> BEGRUN: h = 0.6777
> BEGRUN: G (internal units) = 43018.7
> BEGRUN: UnitMass_in_g = 1.989e+43
> BEGRUN: UnitLenth_in_cm = 3.08568e+21
> BEGRUN: UnitTime_in_s = 3.08568e+16
> BEGRUN: UnitVelocity_in_cm_per_s = 100000
> BEGRUN: UnitDensity_in_cgs = 6.76991e-22
> BEGRUN: UnitEnergy_in_cgs = 1.989e+53
>
> NGENIC: generated grid of size 1024
> NGENIC: computing displacement fields...
> NGENIC: vel_prefac1= 5.54175 hubble_a=55.4176 fom1=0.999999
> NGENIC: vel_prefac2= 11.0835 hubble_a=55.4176 fom2=2
> found 579000 rows in input spectrum table
>
> Normalization of spectrum in file: Sigma8 = 0.819434
> Normalization adjusted to Sigma8=0.8288 (Normfac=1.02299)
>
> NGENIC: Dplus=78.3218
> NGENIC_2LPT: Computing secondary source term, derivatices 0 0
> NGENIC: setting up modes in kspace...
> NGENIC_2LPT: Computing secondary source term, derivatices 1 1
> NGENIC: setting up modes in kspace...
> NGENIC_2LPT: Computing secondary source term, derivatices 2 2
> NGENIC: setting up modes in kspace...
> NGENIC_2LPT: Computing secondary source term, derivatices 0 1
> NGENIC: setting up modes in kspace...
> NGENIC_2LPT: Computing secondary source term, derivatices 0 2
> NGENIC: setting up modes in kspace...
> NGENIC_2LPT: Computing secondary source term, derivatices 1 2
> NGENIC: setting up modes in kspace...
> NGENIC_2LPT: Secondary source term computed in real space
> NGENIC_2LPT: Done transforming it to k-space
> NGENIC_2LPT: Obtaining second order displacements for axes=0
> NGENIC_2LPT: Obtaining second order displacements for axes=1
> NGENIC_2LPT: Obtaining second order displacements for axes=2
> NGENIC_2LPT: Obtaining Zeldovich displacements for axes=0
> NGENIC: setting up modes in kspace...
> NGENIC_2LPT: Obtaining Zeldovich displacements for axes=1
> NGENIC: setting up modes in kspace...
> NGENIC_2LPT: Obtaining Zeldovich displacements for axes=2
> NGENIC: setting up modes in kspace...
>
> NGENIC: Maximum displacement: 375.266, in units of the part-spacing= 0.768545
>
>
> NGENIC: Maximum velocity component: 2076.67
>
> INIT: Testing ID uniqueness...
> INIT: success. took=1.45795 sec
>
> DOMAIN: Begin domain decomposition (sync-point 0).
> DOMAIN: New shift vector determined (-165190 47404.2 171461)
> DOMAIN: Sum=2 TotalCost=2 NumTimeBinsToBeBalanced=1 MultipleDomains=2
> DOMAIN: Increasing TopNodeAllocFactor=0.08 new value=0.104
> DOMAIN: Increasing TopNodeAllocFactor=0.104 new value=0.1352
> DOMAIN: Increasing TopNodeAllocFactor=0.1352 new value=0.17576
> DOMAIN: Increasing TopNodeAllocFactor=0.17576 new value=0.228488
> DOMAIN: Increasing TopNodeAllocFactor=0.228488 new value=0.297034
> DOMAIN: Increasing TopNodeAllocFactor=0.297034 new value=0.386145
> DOMAIN: Increasing TopNodeAllocFactor=0.386145 new value=0.501988
> DOMAIN: Increasing TopNodeAllocFactor=0.501988 new value=0.652585
> DOMAIN: Increasing TopNodeAllocFactor=0.652585 new value=0.84836
> DOMAIN: Increasing TopNodeAllocFactor=0.84836 new value=1.10287
> DOMAIN: Increasing TopNodeAllocFactor=1.10287 new value=1.43373
> DOMAIN: Increasing TopNodeAllocFactor=1.43373 new value=1.86385
> DOMAIN: Increasing TopNodeAllocFactor=1.86385 new value=2.423
> DOMAIN: Increasing TopNodeAllocFactor=2.423 new value=3.1499
> DOMAIN: Increasing TopNodeAllocFactor=3.1499 new value=4.09487
> DOMAIN: NTopleaves=4096, determination of top-level tree involved 4 iterations and took 50.5168 sec
> DOMAIN: we are going to try at most 474 different settings for combining the domains on tasks=216, nnodes=6
> DOMAIN: total_cost=2 total_load=1
> DOMAIN: best solution found after 1 iterations by task=75 for nextra=16, reaching maximum imbalance of 1.06271|1.06288
> DOMAIN: combining multiple-domains took 0.588464 sec
> DOMAIN: exchange of 1073741824 particles
> DOMAIN: particle exchange done. (took 14.3663 sec)
> DOMAIN: domain decomposition done. (took in total 67.5344 sec)
> PEANO: Begin Peano-Hilbert order...
> PEANO: done, took 5.81062 sec.
>
> SNAPSHOT: Setting next time for snapshot file to Time_next= 0.01 (DumpFlag=1)
>
>
>
> Sync-Point 0, Time: 0.01, Redshift: 99, Systemstep: 0, Dloga: 0, Nsync-grv: 1073741824, Nsync-hyd: 0
> DOMAIN: Begin domain decomposition (sync-point 0).
> DOMAIN: New shift vector determined (-141172 229279 -198623)
> DOMAIN: Sum=2 TotalCost=2 NumTimeBinsToBeBalanced=1 MultipleDomains=2
> DOMAIN: NTopleaves=4096, determination of top-level tree involved 4 iterations and took 6.65769 sec
> DOMAIN: we are going to try at most 474 different settings for combining the domains on tasks=216, nnodes=6
> DOMAIN: total_cost=2 total_load=1
> DOMAIN: best solution found after 1 iterations by task=72 for nextra=20, reaching maximum imbalance of 1.06096|1.06104
> DOMAIN: combining multiple-domains took 0.492839 sec
> DOMAIN: exchange of 1073741824 particles
> DOMAIN: particle exchange done. (took 12.0037 sec)
> DOMAIN: domain decomposition done. (took in total 21.0146 sec)
> PEANO: Begin Peano-Hilbert order...
> PEANO: done, took 5.64457 sec.
> ACCEL: Start tree gravity force computation... (1073741824 particles)
> PM-PERIODIC: Starting periodic PM calculation. (Rcut=8789.06) presently allocated=1106.3 MB
> PM-PERIODIC: done. (took 11.8741 seconds)
> TIMESTEPS: displacement time constraint: 0.0926602 (0.05)
> TREE: Full tree construction for all particles. (presently allocated=1637.45 MB)
> GRAVTREE: Tree construction done. took 9.68924 sec <numnodes>=703477 NTopnodes=4681 NTopleaves=4096 tree-build-scalability=0.993377
> FMM: Begin tree force. timebin=0 (presently allocated=0.4 MB)
> Code termination on task=208, function tree_fetch_foreign_nodes(), file src/tree/tree.cc, line 1101: We are out of storage for foreig
> n nodes: NumForeignNodes=587074 MaxForeignNodes=587074 j=1 n_parts=0
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 208 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
>
> Tiago Castro Post Doc, Department of Physics / UNITS / OATS
> Phone: (+39 040 3199 120)
> Mobile: (+39 388 794 1562)
> Email: tiagobscastro_at_gmail.com
> Website: tiagobscastro.com
> Skype: tiagobscastro
> Address: Osservatorio Astronomico di Trieste / Villa Bazzoni
> Via Bazzoni, 2, 34143 Trieste TS
>
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
Received on 2020-12-19 17:47:40

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:32 CET