Re: Problems with treebuild -- setting the TREE_NUM_BEFORE_NODESPLIT

From: Weiguang Cui <cuiweiguang_at_gmail.com>
Date: Tue, 24 Aug 2021 11:29:20 +0100

Hi Volker,

This is a pure dark-matter particle run. This happens when the simulation
ran to z~0.3.
As you can see from the attached config options, this simulation used an
old IC file, neither the double-precision output is opened.

I increased the factor from 0.1 to 0.5, which still resulted in the same
error in the fmm.cc. I don't think memory is an issue here. As shown in
memory.txt, the maximum occupied memory (in the whole file) is
```MEMORY: Largest Allocation = 11263.9 Mbyte | Largest Allocation
Without Generic = 11263.9 Mbyte``` and the parameter ```MaxMemSize
       18000 % in MByte``` is in agreement with the machine's memory
(cosma7). I will increase the factor to an even higher value to see if that
works.

If the single-precision position is not an issue, could it be caused by the
`FoFGravTree.treebuild(num, d);` or `FoFGravTree.treebuild(num_removed,
dremoved);` in subfind_unbind in which an FoF group has too many particles
in a very small volume to build the tree?

Any suggestions are welcome. Many thanks!

==================================
    ALLOW_HDF5_COMPRESSION
    ASMTH=1.2
    DOUBLEPRECISION=1
    DOUBLEPRECISION_FFTW
    FMM
    FOF
    FOF_GROUP_MIN_LEN=32
    FOF_LINKLENGTH=0.2
    FOF_PRIMARY_LINK_TYPES=2
    FOF_SECONDARY_LINK_TYPES=1+16+32
    GADGET2_HEADER
    IDS_64BIT
    LIGHTCONE
    LIGHTCONE_IMAGE_COMP_HSML_VELDISP
    LIGHTCONE_MASSMAPS
    LIGHTCONE_PARTICLES
    LIGHTCONE_PARTICLES_GROUPS
    MERGERTREE
    MULTIPOLE_ORDER=3
    NTAB=128
    NTYPES=6
    PERIODIC
    PMGRID=4096
    RANDOMIZE_DOMAINCENTER
    RCUT=4.5
    SELFGRAVITY
    SUBFIND
    SUBFIND_HBT
    TREE_NUM_BEFORE_NODESPLIT=64
===========================================================


Best,
Weiguang

-------------------------------------------
https://weiguangcui.github.io/


On Mon, Aug 23, 2021 at 1:49 PM Volker Springel <
vspringel_at_mpa-garching.mpg.de> wrote:

>
> Hi Weiguang,
>
> The code termination you experienced in the tree construction during
> subfind is quite puzzling to me, especially since you used
> BITS_FOR_POSITIONS=64... In principle, this situation should only arise if
> you have a small group of particles (~16) in a region about 10^18 smaller
> than the boxsize. Has this situation occurred during a simulation run, or
> in postprocessing? If you have used single precision for storing positions
> in a snapshot file, or if you have dense blobs of gas with intense star
> formation, then you can get occasional coordinate collisions of two or
> several particles, but ~16 seems increasingly unlikely. So I'm not sure
> what's really going on here. Have things acually worked when setting
> TREE_NUM_BEFORE_NODESPLIT=64?
>
> The issue in FMM is a memory issue. It should be possible to resolve it
> with a higher setting of MaxMemSize, or by enlarging the factor 0.1 in line
> 1745 of fmm.cc,
> MaxOnFetchStack = std::max<int>(0.1 * (Tp->NumPart + NumPartImported),
> TREE_MIN_WORKSTACK_SIZE);
>
> Best,
> Volker
>
>
> > On 21. Aug 2021, at 10:10, Weiguang Cui <cuiweiguang_at_gmail.com> wrote:
> >
> > Dear all,
> >
> > I recently met another problem with the 2048^3, 200 mpc/h run.
> >
> > treebuild in SUBFIND requires a higher value for
> TREE_NUM_BEFORE_NODESPLIT:
> > ==========================================================
> > SUBFIND: We now execute a parallel version of SUBFIND.
> > SUBFIND: Previous subhalo catalogue had approximately a size
> 2.42768e+09, and the summed squared subhalo size was 8.42698e+16
> > SUBFIND: Number of FOF halos treated with collective SubFind algorithm =
> 1
> > SUBFIND: Number of processors used in different partitions for the
> collective SubFind code = 2
> > SUBFIND: (The adopted size-limit for the collective algorithm was
> 9631634 particles, for threshold size factor 0.6)
> > SUBFIND: The other 10021349 FOF halos are treated in parallel with
> serial code
> > SUBFIND: subfind_distribute_groups() took 0.044379 sec
> > SUBFIND: particle balance=1.10537
> > SUBFIND: subfind_exchange() took 30.2562 sec
> > SUBFIND: particle balance for processing=1
> > SUBFIND: root-task=0: Collectively doing halo 0 of length 10426033 on
> 2 processors.
> > SUBFIND: subdomain decomposition took 8.54527 sec
> > SUBFIND: serial subfind subdomain decomposition took 6.0162 sec
> > SUBFIND: root-task=0: total number of subhalo coll_candidates=1454
> > SUBFIND: root-task=0: number of subhalo candidates small enough to be
> done with one cpu: 1453. (Largest size 81455)
> > Code termination on task=0, function treebuild_insert_group_of_points(),
> file src/tree/tree.cc, line 489: It appears we have reached the bottom of
> the tree because there are more than TREE_NUM_BEFORE_NODESPLIT=16 particles
> in the smallest tree node representable for BITS_FOR_POSITIONS=64.
> > Either eliminate the particles at (nearly) indentical coordinates,
> increase the setting for TREE_NUM_BEFORE_NODESPLIT, or possibly enlarge
> BITS_FOR_POSITIONS if you have really not enough dynamic range
> > ==============================================
> >
> > But, if I increase the TREE_NUM_BEFORE_NODESPLIT to 64, FMM seems not
> working:
> > =============================================================
> > Sync-Point 19835, Time: 0.750591, Redshift: 0.332284, Systemstep:
> 5.27389e-05, Dloga: 7.02657e-05, Nsync-grv: 31415, Nsync-hyd:
> 0
> > ACCEL: Start tree gravity force computation... (31415 particles)
> > TREE: Full tree construction for all particles. (presently
> allocated=7626.51 MB)
> > GRAVTREE: Tree construction done. took 13.4471 sec <numnodes>=206492
> NTopnodes=115433 NTopleaves=101004 tree-build-scalability=0.441627
> > FMM: Begin tree force. timebin=13 (presently allocated=0.5 MB)
> > Code termination on task=0, function gravity_fmm(), file src/fmm/fmm.cc,
> line 1879: Can't even process a single particle
> > Code termination on task=887, function gravity_fmm(), file
> src/fmm/fmm.cc, line 1879: Can't even process a single particle
> > Code termination on task=40, function gravity_fmm(), file
> src/fmm/fmm.cc, line 1879: Can't even process a single particle
> > Code termination on task=888, function gravity_fmm(), file
> src/fmm/fmm.cc, line 1879: Can't even process a single particle
> > Code termination on task=889, function gravity_fmm(), file
> src/fmm/fmm.cc, line 1879: Can't even process a single particle
> > Code termination on task=3, function gravity_fmm(), file src/fmm/fmm.cc,
> line 1879: Can't even process a single particle
> > Code termination on task=890, function gravity_fmm(), file
> src/fmm/fmm.cc, line 1879: Can't even process a single particle
> > Code termination on task=6, function gravity_fmm(), file src/fmm/fmm.cc,
> line 1879: Can't even process a single particle
> > Code termination on task=891, function gravity_fmm(), file
> src/fmm/fmm.cc, line 1879: Can't even process a single particle
> > Code termination on task=9, function gravity_fmm(), file src/fmm/fmm.cc,
> line 1879: Can't even process a single particle
> > Code termination on task=892, function gravity_fmm(), file
> src/fmm/fmm.cc, line 1879: Can't even process a single particle
> > Code termination on task=893, function gravity_fmm(), file
> src/fmm/fmm.cc, line 1879: Can't even process a single particle
> > Code termination on task=894, function gravity_fmm(), file
> src/fmm/fmm.cc, line 1879: Can't even process a single particle
> > Code termination on task=20, function gravity_fmm(), file
> src/fmm/fmm.cc, line 1879: Can't even process a single particle
> > ======================================
> >
> > I don't think fine-tuning the value for TREE_NUM_BEFORE_NODESPLIT is a
> solution.
> > I can try to use BITS_FOR_POSITIONS=128 by setting POSITIONS_IN_128BIT,
> but I am afraid that the code may not be able to run from restart files.
> > Any suggestions?
> > Many thanks.
> >
> > Best,
> > Weiguang
> >
> > -------------------------------------------
> > https://weiguangcui.github.io/
> >
> > -----------------------------------------------------------
> >
> > If you wish to unsubscribe from this mailing, send mail to
> > minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
> gadget-list
> > A web-archive of this mailing list is available here:
> > http://www.mpa-garching.mpg.de/gadget/gadget-list
>
>
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
>
Received on 2021-08-24 12:30:02

This archive was generated by hypermail 2.3.0 : 2022-09-01 14:03:43 CEST