Segmentation fault in tree calculation for multiple node runs

From: Ken Osato <ken.osato_at_iap.fr>
Date: Sun, 24 Jan 2021 16:25:37 +0100

Dear Gagdet-community,

I'm working on running dark-matter only cosmological simulations with
Gadget-4.
When I ran the code with the same Config.sh and param.txt of the example
"DM-L50-N128", the code runs perfectly for single node, but for multi
nodes, it fails due to segmentation fault.
I have been using L-Gadget-2 but never encountered such an error on the
same cluster.
I analyzed the core file and it says segmentation fault occurs at the
tree calculation. I suspect the memory allocation has something wrong
when there are multiple shared memories.

I've attached the log file when I ran the code with "DM-L50-N128"
example setting on Cray XC50 with 2 nodes (= 80 cores) and the outputs
of GDB in the following. Any help and suggestion are welcome. Thank you.

Best regards,
Ken Osato


/* GDB outputs */
Core was generated by `./Gadget4 param.txt'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000004b5a5e in tree<gravnode, simparticles, gravpoint_data,
foreign_gravpoint_data>::treebuild_construct (this=0x7fffffff3870) at
src/tree/tree.cc:324
324          Nextnode[MaxPart + i] = TopNodes[index].sibling;
(gdb) bt
#0  0x00000000004b5a5e in tree<gravnode, simparticles, gravpoint_data,
foreign_gravpoint_data>::treebuild_construct (this=0x7fffffff3870) at
src/tree/tree.cc:324
#1  0x00000000004b0878 in tree<gravnode, simparticles, gravpoint_data,
foreign_gravpoint_data>::treebuild (this=0x7fffffff3870, ninsert=24242,
indexlist=0x0) at src/tree/tree.cc:75
#2  0x000000000048ac97 in sim::gravity (this=0x7fffffff2a40, timebin=0)
at src/gravity/gravity.cc:226
#3  0x000000000048b8e5 in sim::compute_grav_accelerations
(this=0x7fffffff2a40, timebin=0) at src/gravity/gravity.cc:110
#4  0x000000000047f4ea in sim::do_gravity_step_second_half
(this=0x7fffffff2a40) at src/time_integration/kicks.cc:379
#5  0x000000000041911a in sim::run (this=0x7fffffff2a40) at
src/main/run.cc:149
#6  0x000000000041631a in main (argc=2, argv=0x7fffffff58f8) at
src/main/main.cc:327
(gdb) f 0
#0  0x00000000004b5a5e in tree<gravnode, simparticles, gravpoint_data,
foreign_gravpoint_data>::treebuild_construct (this=0x7fffffff3870) at
src/tree/tree.cc:324
324          Nextnode[MaxPart + i] = TopNodes[index].sibling;
(gdb) list
319
320          if(TreeSharedMem_ThisTask == 0)
321            TopNodes[index].nextnode = MaxPart + MaxNodes + i;
322
323          /* set nextnode for pseudo-particle (Nextnode exists on all
ranks) */
324          Nextnode[MaxPart + i] = TopNodes[index].sibling;
325        }
326
327      point_data *export_Points = (point_data
*)Mem.mymalloc("export_Points", NumPartExported * sizeof(point_data));
328

-- 
Ken Osato
Institut d'Astrophysique de Paris
98bis boulevard Arago, 75014 Paris, France
Tel: +33 1 44 32 80 00
E-mail: ken.osato_at_iap.fr


Received on 2021-01-24 16:25:51

This archive was generated by hypermail 2.3.0 : 2022-09-01 14:03:43 CEST