Re: Segmentation fault in tree calculation for multiple node runs

From: Volker Springel <vspringel_at_MPA-Garching.MPG.DE>
Date: Tue, 26 Jan 2021 17:50:12 +0100

Dear Ken,

Thanks a lot for reporting this problem. At the moment I cannot yet reproduce it, but it smells like it is related to the shared memory allocation.

Could you let me know which MPI library (and which version) you're using? (Are there several MPI libraries on your system that you could try as well?) Which compiler are you using? (In case you don't know, the outputs of "which mpicc", "mpicc -v", and "ldd ./Gadget-4" should give some pointers)

Best,
Volker



> On 24. Jan 2021, at 16:25, Ken Osato <ken.osato_at_iap.fr> wrote:
>
> Dear Gagdet-community,
>
> I'm working on running dark-matter only cosmological simulations with Gadget-4.
> When I ran the code with the same Config.sh and param.txt of the example "DM-L50-N128", the code runs perfectly for single node, but for multi nodes, it fails due to segmentation fault.
> I have been using L-Gadget-2 but never encountered such an error on the same cluster.
> I analyzed the core file and it says segmentation fault occurs at the tree calculation. I suspect the memory allocation has something wrong when there are multiple shared memories.
>
> I've attached the log file when I ran the code with "DM-L50-N128" example setting on Cray XC50 with 2 nodes (= 80 cores) and the outputs of GDB in the following. Any help and suggestion are welcome. Thank you.
>
> Best regards,
> Ken Osato
>
>
> /* GDB outputs */
> Core was generated by `./Gadget4 param.txt'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 0x00000000004b5a5e in tree<gravnode, simparticles, gravpoint_data, foreign_gravpoint_data>::treebuild_construct (this=0x7fffffff3870) at src/tree/tree.cc:324
> 324 Nextnode[MaxPart + i] = TopNodes[index].sibling;
> (gdb) bt
> #0 0x00000000004b5a5e in tree<gravnode, simparticles, gravpoint_data, foreign_gravpoint_data>::treebuild_construct (this=0x7fffffff3870) at src/tree/tree.cc:324
> #1 0x00000000004b0878 in tree<gravnode, simparticles, gravpoint_data, foreign_gravpoint_data>::treebuild (this=0x7fffffff3870, ninsert=24242, indexlist=0x0) at src/tree/tree.cc:75
> #2 0x000000000048ac97 in sim::gravity (this=0x7fffffff2a40, timebin=0) at src/gravity/gravity.cc:226
> #3 0x000000000048b8e5 in sim::compute_grav_accelerations (this=0x7fffffff2a40, timebin=0) at src/gravity/gravity.cc:110
> #4 0x000000000047f4ea in sim::do_gravity_step_second_half (this=0x7fffffff2a40) at src/time_integration/kicks.cc:379
> #5 0x000000000041911a in sim::run (this=0x7fffffff2a40) at src/main/run.cc:149
> #6 0x000000000041631a in main (argc=2, argv=0x7fffffff58f8) at src/main/main.cc:327
> (gdb) f 0
> #0 0x00000000004b5a5e in tree<gravnode, simparticles, gravpoint_data, foreign_gravpoint_data>::treebuild_construct (this=0x7fffffff3870) at src/tree/tree.cc:324
> 324 Nextnode[MaxPart + i] = TopNodes[index].sibling;
> (gdb) list
> 319
> 320 if(TreeSharedMem_ThisTask == 0)
> 321 TopNodes[index].nextnode = MaxPart + MaxNodes + i;
> 322
> 323 /* set nextnode for pseudo-particle (Nextnode exists on all ranks) */
> 324 Nextnode[MaxPart + i] = TopNodes[index].sibling;
> 325 }
> 326
> 327 point_data *export_Points = (point_data *)Mem.mymalloc("export_Points", NumPartExported * sizeof(point_data));
> 328
>
> --
> Ken Osato
> Institut d'Astrophysique de Paris
> 98bis boulevard Arago, 75014 Paris, France
> Tel: +33 1 44 32 80 00
> E-mail: ken.osato_at_iap.fr
>
> <DM-L50-N128.log>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
Received on 2021-01-26 17:50:12

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:32 CET