Re: Segmentation fault with GADGET4 on multiple nodes

From: Volker Springel <vspringel_at_MPA-Garching.MPG.DE>
Date: Wed, 3 Feb 2021 12:59:56 +0100

Dear Balázs,

Could you please also send the beginning of these log files?

Thanks,
Volker


> On 3. Feb 2021, at 12:03, Balázs Pál <masterdesky_at_gmail.com> wrote:
>
> Dear list members,
>
> As a new GADGET4 user, I've encountered a yet unsolved problem, while testing GADGET4 at my university's new HPC cluster on multiple nodes, controlled by Slurm. I've seen a similar (newly posted) issue on this mailing list, but I can't confirm whether both issues have the same origin.
> I'm trying to run the "colliding galaxies" example using the provided Config.sh and parameter file with OpenMPI 3.1.3. I've built G4 with gcc 8.3.0.
>
> Usually what happens is that the simulation starts running normally, but after some time (sometimes minutes, sometimes only after hours) it crashes with a segmentation fault. I also can't confirm whether this crash is consistent or not. I've tried to run GADGET4 on 4 nodes with 8 CPUs each most of the time, and it crashed similarly after approximately 4-5 hours after start.
>
> Extra info:
> I've attached two separate files, containing the last iteration of two simulations before a crash. The file `log_tail.log` contains the usual crash, which I've encountered every single time. The `log_tail2.log` contains an "maybe useful anomaly", when GADGET4 seems to terminate because of some failure in it's shared memory handler.
>
> I would appreciate it very much if you could give any insight or advice on how to eliminate this problem! If you require any further information, please let me know.
>
> Best Regards,
> Balázs
> <log_tail2.log><log_tail.log>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
Received on 2021-02-03 12:59:56

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:32 CET