[Gadget 4] possibly a bug in pm_nonperiodic.cc after snapshot was saved

From: Weiguang Cui <cuiweiguang_at_gmail.com>
Date: Tue, 4 May 2021 15:03:01 +0100

Hi Volker and Gadget-helpers,

I am running a zoomed-in test with Gadget4, which works fine for saving
previous snapshots, it reported an error after saving this snapshot. I
think there may be some miss connections with the PM part of force
calculation. A brief check with previous snapshot writing, I only found
force calculations following with FMM. Detailed error report follows:

```
SNAPSHOT: writing snapshot block 9 (SubfindVelDisp)...
SNAPSHOT: done with writing snapshot. Took 8.2192 sec, total size 876.136
MB, corresponds to effective I/O rate of 106.596 MB/sec

SNAPSHOT: writing snapshot file #39 _at_ time 0.139043 ...
SNAPSHOT: writing snapshot file: './snapshot-prevmostboundonly_039' (file 1
of 1)
SNAPSHOT: writing snapshot rename './snapshot-prevmostboundonly_039.hdf5'
to './bak-snapshot-prevmostboundonly_039.hdf5'
SNAPSHOT: writing snapshot block 0 (Coordinates)...
SNAPSHOT: writing snapshot block 1 (Velocities)...
SNAPSHOT: writing snapshot block 2 (ParticleIDs)...
SNAPSHOT: writing snapshot block 7 (SubfindDensity)...
SNAPSHOT: writing snapshot block 8 (SubfindHsml)...
SNAPSHOT: writing snapshot block 9 (SubfindVelDisp)...
SNAPSHOT: done with writing snapshot. Took 0.0642038 sec, total size
0.201492 MB, corresponds to effective I/O rate of 3.13832 MB/sec

SNAPSHOT: Setting next time for snapshot file to Time_next= 0.142332
 (DumpFlag=1)

KICKS: 1st gravity for hierarchical timebin=20: 21573294 particles
dt_gravkick=0.0313998 0.0313998 0.0313998
KICKS: 1st gravity for hierarchical timebin=19: 21573294 particles
dt_gravkick=-0.015704 0.0156958 0.0156958
ACCEL: Start tree gravity force computation... (1111209 particles)
TREEPM: Starting PM part of force calculation. (timebin=18)
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 100 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Code termination on task=96, function pmforce_nonperiodic(), file
src/pm/pm_nonperiodic.cc, line 1472: unexpected NSource != Sp->NumPart
Code termination on task=97, function pmforce_nonperiodic(), file
src/pm/pm_nonperiodic.cc, line 1472: unexpected NSource != Sp->NumPart

```

Please let me know if you need other information about the test.


Another unrelated question, I think some of my test/config parameters are
not setting properly. As you can see from the last time step in the cpu.txt
file, treeimbalance occupies half of the cpu time. Do you have any
suggestions on how to improve this?
```
Step 1925, Time: 0.139002, CPUs: 128, HighestActiveTimeBin: 15
                          diff cumulative
total 0.01 100.0% 15689.45 100.0%
  treegrav 0.01 72.6% 12704.52 81.0%
    treebuild 0.01 52.7% 324.26 2.1%
      insert 0.00 18.8% 224.70 1.4%
      branches 0.00 0.2% 9.11 0.1%
      toplevel 0.00 31.1% 28.95 0.2%
    treeforce 0.00 0.6% 12362.98 78.8%
      treewalk 0.00 0.0% 4490.18 28.6%
      treeimbalance 0.00 0.2% 7867.96 50.1%
      treefetch 0.00 0.0% 0.05 0.0%
      treestack 0.00 0.3% 4.80 0.0%
  pm_grav 0.00 0.0% 1915.98 12.2%
  ngbtreevelupdate 0.00 0.1% 0.07 0.0%
  ngbtreehsmlupdate 0.00 0.3% 0.10 0.0%
  sph 0.00 0.0% 0.00 0.0%
    density 0.00 0.0% 0.00 0.0%
      densitywalk 0.00 0.0% 0.00 0.0%
      densityfetch 0.00 0.0% 0.00 0.0%
      densimbalance 0.00 0.0% 0.00 0.0%
    hydro 0.00 0.0% 0.00 0.0%
      hydrowalk 0.00 0.0% 0.00 0.0%
      hydrofetch 0.00 0.0% 0.00 0.0%
      hydroimbalance 0.00 0.0% 0.00 0.0%
  domain 0.00 0.0% 275.44 1.8%
  peano 0.00 0.0% 44.75 0.3%
  drift/kicks 0.00 3.2% 181.57 1.2%
  timeline 0.00 0.0% 4.13 0.0%
  treetimesteps 0.00 0.0% 0.00 0.0%
  i/o 0.00 0.0% 400.66 2.6%
  logs 0.00 20.1% 31.70 0.2%
  fof 0.00 0.0% 39.95 0.3%
    fofwalk 0.00 0.0% 2.20 0.0%
    fofimbal 0.00 0.0% 2.91 0.0%
  subfind 0.00 0.0% 20.92 0.1%
  restart 0.00 0.0% 10.89 0.1%
  misc 0.00 3.7% 58.77 0.4%
```

Many thanks.

Best,
Weiguang

-------------------------------------------
https://weiguangcui.github.io/
Received on 2021-05-04 16:03:50

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:32 CET