Re: fof can't work in multiple node

From: Volker Springel <vspringel_at_MPA-Garching.MPG.DE>
Date: Fri, 23 Sep 2022 09:21:12 +0200

Hi Rui,

Thanks for sending me your detailed setup. I have run your job on our cluster (also using 2 nodes with 32 cores each). This worked fine.

You seem to use the most recent version from the repository based on the git-commit tag, but you have made some (presumably minor?) changes I guess. At least the CAMBNOLOG option is not standard, so I had to disable it, and run with PowerSpectrumType=0, ReNormalizeInputSpectrum=1 instead. But this presumably is unrelated to your problem.

The most likely explantion for your problem that I can offer based on past experience is that you are using a problematic/outdated/buggy MPI library. Which one are you using? Please try a recent version of OpenMPI, which is still the most reliable according to my experience.

A more remote possibility is the compiler. Which one are you using? Best to try with gcc, version 9.3 or later. If it's a compiler issue due to the optimizer (-O3), you may want to try -O0 to exclude this possibility.

Regards,
Volker


> On 21. Sep 2022, at 16:29, HU, Rui <1155168718_at_link.cuhk.edu.hk> wrote:
>
>
> Hi Volker,
>
> The full log has emailed you privately (because it's nearly 3MB) (please check your email box even the junk box?)
>
> The following are some key points:
>
> Code was compiled with the following settings:
> CAMBNOLOG
> CREATE_GRID
> FOF
> NGENIC=128
> NGENIC_2LPT
> NGENIC_FIX_MODE_AMPLITUDES
> PERIODIC
> PMGRID=256
> POWERSPEC_ON_OUTPUT
> SELFGRAVITY
> SUBFIND
>
> FOF: We shall first compute a group catalog for this snapshot file
> FOF: Begin to compute FoF group catalogue... (presently allocated=8.39948 MB)
> FOF: Comoving linking length: 329.526
> TREE: Full tree construction for all particles. (presently allocated=9.40948 MB)
> FOFTREE: Ngb-tree construction done. took 0.0125011 sec <numnodes>=5424.21 NTopnodes=585 NTopleaves=512
> FOF: Start linking particles (presently allocated=9.72656 MB)
> FOF: linking of small cells took 1.68094e-05 sec
> FOF: local links done (took 0.0284756 sec, avg-work=0.0215296, imbalance=1.28485).
> FOF: Marked=225356 out of the 2097152 primaries which are linked
> FOF: begin linking across processors (presently allocated=9.84052 MB)
> Code termination on task=10, function treefind_fof_primary(), file src/fof/fof_findgroups.cc, line 315: unexpected because in the present algorithm we are only allowed walk local branches
>
>
> Hope this could help.
>
> Bests,
> Rui
>
>
> -----Original Message-----
> From: Volker Springel <vspringel_at_MPA-Garching.MPG.DE>
> Sent: 2022年9月19日 23:04
> To: Gadget General Discussion <gadget-list_at_MPA-Garching.MPG.DE>
> Subject: Re: [gadget-list] fof can't work in multiple node
>
>
> Hi Rui,
>
> No, the FOF option can be used with multiple nodes. The termination you encountered is odd and should not have happened.
>
> Could you send your complete configuration, ideally the full stdout of this run? Perhaps there is something unusual about your setup that is related to the problem. Without more information I cannot say much.
>
> Best,
> Volker
>
>> On 18. Sep 2022, at 05:07, HU, Rui <1155168718_at_link.cuhk.edu.hk> wrote:
>>
>> Hi all,
>>
>> I am trying to use gadge4 in the cluster, and I activate the FOF option in Config.sh. I try to use multiple nodes (each node contains one Xeon processor), the simulation comes out the error:
>> “Code termination on task= XXX, function treefind_fof_primary(), file src/fof/fof_findgroups.cc, line 315: unexpected because in the present algorithm we are only allowed walk local branches.”
>>
>> But it works with only one node. So does that error mean the fof algorithm can only be used in one node/cpu? Or are there any issues related parallel running?
>>
>> Best,
>> Rui
>>
>> -----------------------------------------------------------
>>
>> If you wish to unsubscribe from this mailing, send mail to
>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
>> gadget-list A web-archive of this mailing list is available here:
>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>
>
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
>
>
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
Received on 2022-09-23 09:21:12

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:33 CET