Dear list,
I am trying to run a cosmological simulation (500.0 Mpc/h, 1024^3
particles), and the code returns me the error below. Looking at the source
code, I could not understand exactly how MaxForeignNodes is decided and if
there's something I can try to change on the parameters file. I am already
using the entire local cluster.
Many thanks!
---------------------------------
Shared memory islands host a minimum of 37 and a maximum of 37 MPI ranks.
We shall use 6 MPI ranks in total for assisting one-sided communication (1
per shared memory node).
___ __ ____ ___ ____ ____ __
/ __) /__\ ( _ \ / __)( ___)(_ _)___ /. |
( (_-. /(__)\ )(_) )( (_-. )__) )( (___)(_ _)
\___/(__)(__)(____/ \___/(____) (__) (_)
This is Gadget, version 4.0.
Git commit unknown, unknown
Code was compiled with the following compiler and flags:
mpicxx -std=c++11 -ggdb -O3 -march=native -Wall -Wno-format-security
-I/beegfs/tcastro/gadget4/include/ -I/beegfs/tcastro/gadge
t4/include/gsl -I/beegfs/tcastro/gadget4/include/ -Ibuild -Isrc
Code was compiled with the following settings:
ASMTH=3.0
CREATE_GRID
DOUBLEPRECISION=1
DOUBLEPRECISION_FFTW
ENLARGE_DYNAMIC_RANGE_IN_TIME
FMM
FOF
FOF_GROUP_MIN_LEN=100
FOF_LINKLENGTH=0.2
FOF_PRIMARY_LINK_TYPES=2
HIERARCHICAL_GRAVITY
IMPOSE_PINNING
LEAN
MERGERTREE
MULTIPOLE_ORDER=5
NGENIC=1024
NGENIC_2LPT
NSOFTCLASSES=1
NTAB=256
NTYPES=6
OUTPUT_TIMESTEP
PERIODIC
PMGRID=1024
POWERSPEC_ON_OUTPUT
PRESERVE_SHMEM_BINARY_INVARIANCE
RANDOMIZE_DOMAINCENTER
RCUT=6.0
SELFGRAVITY
SUBFIND
SUBFIND_HBT
TREE_NUM_BEFORE_NODESPLIT=4
Running on 216 MPI tasks.
BEGRUN: Size of particle structure 128 [bytes]
BEGRUN: Size of sph particle structure 216 [bytes]
BEGRUN: Size of gravity tree node 352 [bytes]
BEGRUN: Size of neighbour tree node 192 [bytes]
BEGRUN: Size of subfind auxiliary data 64 [bytes]
PINNING: We have 4 sockets, 40 physical cores and 40 logical cores on the
first MPI-task's node.
PINNING: Looks like 10 logical cores are available.
PINNING: Looks like already before start of the code, a tight binding was
imposed.
PINNING: We refrain from any pinning attempt ourselves. (This can be
changed by setting the compile flag IMPOSE_PINNING_OVERRIDE_MODE
.)
-------------------------------------------------------------------------------------------------------------------------
AvailMem: Largest = 251624.55 Mb (on task= 144), Smallest =
251293.10 Mb (on task= 72), Average = 251463.74 Mb
Total Mem: Largest = 257655.01 Mb (on task= 0), Smallest =
257655.01 Mb (on task= 0), Average = 257655.01 Mb
Committed_AS: Largest = 6361.91 Mb (on task= 72), Smallest =
6030.45 Mb (on task= 144), Average = 6191.26 Mb
SwapTotal: Largest = 4000.00 Mb (on task= 0), Smallest =
4000.00 Mb (on task= 0), Average = 4000.00 Mb
SwapFree: Largest = 4000.00 Mb (on task= 0), Smallest =
3966.40 Mb (on task= 180), Average = 3992.73 Mb
AllocMem: Largest = 6361.91 Mb (on task= 72), Smallest =
6030.45 Mb (on task= 144), Average = 6191.26 Mb
avail /dev/shm: Largest = 128788.88 Mb (on task= 144), Smallest =
128785.64 Mb (on task= 0), Average = 128787.51 Mb
-------------------------------------------------------------------------------------------------------------------------
Task=0 has the maximum commited memory and is host: gen09-10
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Obtaining parameters from file 'param.1024p3.txt':
InitCondFile ./ics
OutputDir ./1024p3
SnapshotFileBase snap
OutputListFilename ./outputs.txt
ICFormat 2
SnapFormat 3
TimeLimitCPU 172800
CpuTimeBetRestartFile 7200
MaxMemSize 3200
TimeBegin 0.01
TimeMax 1
ComovingIntegrationOn 1
Omega0 0.30711
OmegaLambda 0.69289
OmegaBaryon 0.04825
HubbleParam 0.6777
Hubble 0.1
BoxSize 500000
OutputListOn 1
TimeBetSnapshot 0
TimeOfFirstSnapshot 0
TimeBetStatistics 0.01
NumFilesPerSnapshot 16
MaxFilesWithConcurrentIO 8
ErrTolIntAccuracy 0.05
CourantFac 0.15
MaxSizeTimestep 0.05
MinSizeTimestep 0
TypeOfOpeningCriterion 1
ErrTolTheta 0.4
ErrTolThetaMax 1
ErrTolForceAcc 0.005
TopNodeFactor 3
ActivePartFracForNewDomainDecomp 0.01
DesNumNgb 64
MaxNumNgbDeviation 1
UnitLength_in_cm 3.08568e+21
UnitMass_in_g 1.989e+43
UnitVelocity_in_cm_per_s 100000
GravityConstantInternal 0
SofteningComovingClass0 12
SofteningMaxPhysClass0 12
SofteningClassOfPartType0 0
SofteningClassOfPartType1 0
SofteningClassOfPartType2 0
SofteningClassOfPartType3 0
SofteningClassOfPartType4 0
SofteningClassOfPartType5 0
DesLinkNgb 20
ArtBulkViscConst 1
MinEgySpec 0
InitGasTemp 0
NSample 1024
GridSize 1024
Seed 181170
SphereMode 1
PowerSpectrumType 2
ReNormalizeInputSpectrum 1
PrimordialIndex 1
ShapeGamma 0.21
Sigma8 0.8288
PowerSpectrumFile powerspec
InputSpectrum_UnitLength_in_cm 3.08568e+24
MALLOC: Allocation of shared memory took 0.00582997 sec
found 5 times in output-list.
BEGRUN: Hubble (internal units) = 0.1
BEGRUN: h = 0.6777
BEGRUN: G (internal units) = 43018.7
BEGRUN: UnitMass_in_g = 1.989e+43
BEGRUN: UnitLenth_in_cm = 3.08568e+21
BEGRUN: UnitTime_in_s = 3.08568e+16
BEGRUN: UnitVelocity_in_cm_per_s = 100000
BEGRUN: UnitDensity_in_cgs = 6.76991e-22
BEGRUN: UnitEnergy_in_cgs = 1.989e+53
NGENIC: generated grid of size 1024
NGENIC: computing displacement fields...
NGENIC: vel_prefac1= 5.54175 hubble_a=55.4176 fom1=0.999999
NGENIC: vel_prefac2= 11.0835 hubble_a=55.4176 fom2=2
found 579000 rows in input spectrum table
Normalization of spectrum in file: Sigma8 = 0.819434
Normalization adjusted to Sigma8=0.8288 (Normfac=1.02299)
NGENIC: Dplus=78.3218
NGENIC_2LPT: Computing secondary source term, derivatices 0 0
NGENIC: setting up modes in kspace...
NGENIC_2LPT: Computing secondary source term, derivatices 1 1
NGENIC: setting up modes in kspace...
NGENIC_2LPT: Computing secondary source term, derivatices 2 2
NGENIC: setting up modes in kspace...
NGENIC_2LPT: Computing secondary source term, derivatices 0 1
NGENIC: setting up modes in kspace...
NGENIC_2LPT: Computing secondary source term, derivatices 0 2
NGENIC: setting up modes in kspace...
NGENIC_2LPT: Computing secondary source term, derivatices 1 2
NGENIC: setting up modes in kspace...
NGENIC_2LPT: Secondary source term computed in real space
NGENIC_2LPT: Done transforming it to k-space
NGENIC_2LPT: Obtaining second order displacements for axes=0
NGENIC_2LPT: Obtaining second order displacements for axes=1
NGENIC_2LPT: Obtaining second order displacements for axes=2
NGENIC_2LPT: Obtaining Zeldovich displacements for axes=0
NGENIC: setting up modes in kspace...
NGENIC_2LPT: Obtaining Zeldovich displacements for axes=1
NGENIC: setting up modes in kspace...
NGENIC_2LPT: Obtaining Zeldovich displacements for axes=2
NGENIC: setting up modes in kspace...
NGENIC: Maximum displacement: 375.266, in units of the part-spacing=
0.768545
NGENIC: Maximum velocity component: 2076.67
INIT: Testing ID uniqueness...
INIT: success. took=1.45795 sec
DOMAIN: Begin domain decomposition (sync-point 0).
DOMAIN: New shift vector determined (-165190 47404.2 171461)
DOMAIN: Sum=2 TotalCost=2 NumTimeBinsToBeBalanced=1 MultipleDomains=2
DOMAIN: Increasing TopNodeAllocFactor=0.08 new value=0.104
DOMAIN: Increasing TopNodeAllocFactor=0.104 new value=0.1352
DOMAIN: Increasing TopNodeAllocFactor=0.1352 new value=0.17576
DOMAIN: Increasing TopNodeAllocFactor=0.17576 new value=0.228488
DOMAIN: Increasing TopNodeAllocFactor=0.228488 new value=0.297034
DOMAIN: Increasing TopNodeAllocFactor=0.297034 new value=0.386145
DOMAIN: Increasing TopNodeAllocFactor=0.386145 new value=0.501988
DOMAIN: Increasing TopNodeAllocFactor=0.501988 new value=0.652585
DOMAIN: Increasing TopNodeAllocFactor=0.652585 new value=0.84836
DOMAIN: Increasing TopNodeAllocFactor=0.84836 new value=1.10287
DOMAIN: Increasing TopNodeAllocFactor=1.10287 new value=1.43373
DOMAIN: Increasing TopNodeAllocFactor=1.43373 new value=1.86385
DOMAIN: Increasing TopNodeAllocFactor=1.86385 new value=2.423
DOMAIN: Increasing TopNodeAllocFactor=2.423 new value=3.1499
DOMAIN: Increasing TopNodeAllocFactor=3.1499 new value=4.09487
DOMAIN: NTopleaves=4096, determination of top-level tree involved 4
iterations and took 50.5168 sec
DOMAIN: we are going to try at most 474 different settings for combining
the domains on tasks=216, nnodes=6
DOMAIN: total_cost=2 total_load=1
DOMAIN: best solution found after 1 iterations by task=75 for nextra=16,
reaching maximum imbalance of 1.06271|1.06288
DOMAIN: combining multiple-domains took 0.588464 sec
DOMAIN: exchange of 1073741824 particles
DOMAIN: particle exchange done. (took 14.3663 sec)
DOMAIN: domain decomposition done. (took in total 67.5344 sec)
PEANO: Begin Peano-Hilbert order...
PEANO: done, took 5.81062 sec.
SNAPSHOT: Setting next time for snapshot file to Time_next= 0.01
(DumpFlag=1)
Sync-Point 0, Time: 0.01, Redshift: 99, Systemstep: 0, Dloga: 0, Nsync-grv:
1073741824, Nsync-hyd: 0
DOMAIN: Begin domain decomposition (sync-point 0).
DOMAIN: New shift vector determined (-141172 229279 -198623)
DOMAIN: Sum=2 TotalCost=2 NumTimeBinsToBeBalanced=1 MultipleDomains=2
DOMAIN: NTopleaves=4096, determination of top-level tree involved 4
iterations and took 6.65769 sec
DOMAIN: we are going to try at most 474 different settings for combining
the domains on tasks=216, nnodes=6
DOMAIN: total_cost=2 total_load=1
DOMAIN: best solution found after 1 iterations by task=72 for nextra=20,
reaching maximum imbalance of 1.06096|1.06104
DOMAIN: combining multiple-domains took 0.492839 sec
DOMAIN: exchange of 1073741824 particles
DOMAIN: particle exchange done. (took 12.0037 sec)
DOMAIN: domain decomposition done. (took in total 21.0146 sec)
PEANO: Begin Peano-Hilbert order...
PEANO: done, took 5.64457 sec.
ACCEL: Start tree gravity force computation... (1073741824 particles)
PM-PERIODIC: Starting periodic PM calculation. (Rcut=8789.06) presently
allocated=1106.3 MB
PM-PERIODIC: done. (took 11.8741 seconds)
TIMESTEPS: displacement time constraint: 0.0926602 (0.05)
TREE: Full tree construction for all particles. (presently
allocated=1637.45 MB)
GRAVTREE: Tree construction done. took 9.68924 sec <numnodes>=703477
NTopnodes=4681 NTopleaves=4096 tree-build-scalability=0.993377
FMM: Begin tree force. timebin=0 (presently allocated=0.4 MB)
Code termination on task=208, function tree_fetch_foreign_nodes(), file
src/tree/tree.cc, line 1101: We are out of storage for foreig
n nodes: NumForeignNodes=587074 MaxForeignNodes=587074 j=1 n_parts=0
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 208 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
*Tiago Castro* Post Doc, Department of Physics / UNITS / OATS
Phone: *(* <%28+39%29%20327%20498%200157>*+39 040 3199 120) *
<%28+39%29%20327%20498%200157>
Mobile: *(* <%28+39%29%20327%20498%200157>*+39 388 794 1562) *
<%28+39%29%20327%20498%200157>
Email: *tiagobscastro_at_gmail.com* <tiagobscastro_at_gmail.com>
Website: *tiagobscastro.com <
http://tiagobscastro.com>*
<
http://sites.if.ufrj.br/castro/en>
Skype: *tiagobscastro* <
https://webapp.wisestamp.com/#>
Address:
*Osservatorio Astronomico di Trieste / Villa BazzoniVia Bazzoni, *
*2, 34143 Trieste TS* [image: photo]
<
http://ws-promos.appspot.com/r?rdata=eyJydXJsIjogImh0dHA6Ly93d3cud2lzZXN0YW1wLmNvbS9lbWFpbC1pbnN0YWxsP3dzX25jaWQ9NjcyMjk0MDA4JnV0bV9zb3VyY2U9ZXh0ZW5zaW9uJnV0bV9tZWRpdW09ZW1haWwmdXRtX2NhbXBhaWduPXByb21vXzU3MzI1Njg1NDg3Njk3OTIiLCAiZSI6ICI1NzMyNTY4NTQ4NzY5NzkyIn0=&u=754281802009791>
Received on 2020-12-16 18:46:25