Re: Problems with treebuild -- setting the TREE_NUM_BEFORE_NODESPLIT

From: Volker Springel <vspringel_at_MPA-Garching.MPG.DE>
Date: Mon, 23 Aug 2021 14:48:41 +0200

Hi Weiguang,

The code termination you experienced in the tree construction during subfind is quite puzzling to me, especially since you used BITS_FOR_POSITIONS=64... In principle, this situation should only arise if you have a small group of particles (~16) in a region about 10^18 smaller than the boxsize. Has this situation occurred during a simulation run, or in postprocessing? If you have used single precision for storing positions in a snapshot file, or if you have dense blobs of gas with intense star formation, then you can get occasional coordinate collisions of two or several particles, but ~16 seems increasingly unlikely. So I'm not sure what's really going on here. Have things acually worked when setting TREE_NUM_BEFORE_NODESPLIT=64?

The issue in FMM is a memory issue. It should be possible to resolve it with a higher setting of MaxMemSize, or by enlarging the factor 0.1 in line 1745 of fmm.cc,
MaxOnFetchStack = std::max<int>(0.1 * (Tp->NumPart + NumPartImported), TREE_MIN_WORKSTACK_SIZE);

Best,
Volker


> On 21. Aug 2021, at 10:10, Weiguang Cui <cuiweiguang_at_gmail.com> wrote:
>
> Dear all,
>
> I recently met another problem with the 2048^3, 200 mpc/h run.
>
> treebuild in SUBFIND requires a higher value for TREE_NUM_BEFORE_NODESPLIT:
> ==========================================================
> SUBFIND: We now execute a parallel version of SUBFIND.
> SUBFIND: Previous subhalo catalogue had approximately a size 2.42768e+09, and the summed squared subhalo size was 8.42698e+16
> SUBFIND: Number of FOF halos treated with collective SubFind algorithm = 1
> SUBFIND: Number of processors used in different partitions for the collective SubFind code = 2
> SUBFIND: (The adopted size-limit for the collective algorithm was 9631634 particles, for threshold size factor 0.6)
> SUBFIND: The other 10021349 FOF halos are treated in parallel with serial code
> SUBFIND: subfind_distribute_groups() took 0.044379 sec
> SUBFIND: particle balance=1.10537
> SUBFIND: subfind_exchange() took 30.2562 sec
> SUBFIND: particle balance for processing=1
> SUBFIND: root-task=0: Collectively doing halo 0 of length 10426033 on 2 processors.
> SUBFIND: subdomain decomposition took 8.54527 sec
> SUBFIND: serial subfind subdomain decomposition took 6.0162 sec
> SUBFIND: root-task=0: total number of subhalo coll_candidates=1454
> SUBFIND: root-task=0: number of subhalo candidates small enough to be done with one cpu: 1453. (Largest size 81455)
> Code termination on task=0, function treebuild_insert_group_of_points(), file src/tree/tree.cc, line 489: It appears we have reached the bottom of the tree because there are more than TREE_NUM_BEFORE_NODESPLIT=16 particles in the smallest tree node representable for BITS_FOR_POSITIONS=64.
> Either eliminate the particles at (nearly) indentical coordinates, increase the setting for TREE_NUM_BEFORE_NODESPLIT, or possibly enlarge BITS_FOR_POSITIONS if you have really not enough dynamic range
> ==============================================
>
> But, if I increase the TREE_NUM_BEFORE_NODESPLIT to 64, FMM seems not working:
> =============================================================
> Sync-Point 19835, Time: 0.750591, Redshift: 0.332284, Systemstep: 5.27389e-05, Dloga: 7.02657e-05, Nsync-grv: 31415, Nsync-hyd: 0
> ACCEL: Start tree gravity force computation... (31415 particles)
> TREE: Full tree construction for all particles. (presently allocated=7626.51 MB)
> GRAVTREE: Tree construction done. took 13.4471 sec <numnodes>=206492 NTopnodes=115433 NTopleaves=101004 tree-build-scalability=0.441627
> FMM: Begin tree force. timebin=13 (presently allocated=0.5 MB)
> Code termination on task=0, function gravity_fmm(), file src/fmm/fmm.cc, line 1879: Can't even process a single particle
> Code termination on task=887, function gravity_fmm(), file src/fmm/fmm.cc, line 1879: Can't even process a single particle
> Code termination on task=40, function gravity_fmm(), file src/fmm/fmm.cc, line 1879: Can't even process a single particle
> Code termination on task=888, function gravity_fmm(), file src/fmm/fmm.cc, line 1879: Can't even process a single particle
> Code termination on task=889, function gravity_fmm(), file src/fmm/fmm.cc, line 1879: Can't even process a single particle
> Code termination on task=3, function gravity_fmm(), file src/fmm/fmm.cc, line 1879: Can't even process a single particle
> Code termination on task=890, function gravity_fmm(), file src/fmm/fmm.cc, line 1879: Can't even process a single particle
> Code termination on task=6, function gravity_fmm(), file src/fmm/fmm.cc, line 1879: Can't even process a single particle
> Code termination on task=891, function gravity_fmm(), file src/fmm/fmm.cc, line 1879: Can't even process a single particle
> Code termination on task=9, function gravity_fmm(), file src/fmm/fmm.cc, line 1879: Can't even process a single particle
> Code termination on task=892, function gravity_fmm(), file src/fmm/fmm.cc, line 1879: Can't even process a single particle
> Code termination on task=893, function gravity_fmm(), file src/fmm/fmm.cc, line 1879: Can't even process a single particle
> Code termination on task=894, function gravity_fmm(), file src/fmm/fmm.cc, line 1879: Can't even process a single particle
> Code termination on task=20, function gravity_fmm(), file src/fmm/fmm.cc, line 1879: Can't even process a single particle
> ======================================
>
> I don't think fine-tuning the value for TREE_NUM_BEFORE_NODESPLIT is a solution.
> I can try to use BITS_FOR_POSITIONS=128 by setting POSITIONS_IN_128BIT, but I am afraid that the code may not be able to run from restart files.
> Any suggestions?
> Many thanks.
>
> Best,
> Weiguang
>
> -------------------------------------------
> https://weiguangcui.github.io/
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
Received on 2021-08-23 14:48:51

This archive was generated by hypermail 2.3.0 : 2022-09-01 14:03:43 CEST