(unknown charset) Re: Segmentation fault with GADGET4 on multiple nodes

From: (unknown charset) Balázs Pál <masterdesky_at_gmail.com>
Date: Thu, 4 Feb 2021 21:27:01 +0100

Dear Volker,

Yes, I'm sending the first 200 rows of the logs just to be sure in case of
these two log files (named `log1` and `log2` respectively for the
`log_tail` and `log_tail2` files). I'm also attaching the head and tail of
two more, similar log files. Extra info: nodes of this cluster contains 18
CPUs each.

Regards,
Balázs

On Wed, 3 Feb 2021 at 13:00, Volker Springel <vspringel_at_mpa-garching.mpg.de>
wrote:

>
> Dear Balázs,
>
> Could you please also send the beginning of these log files?
>
> Thanks,
> Volker
>
>
> > On 3. Feb 2021, at 12:03, Balázs Pál <masterdesky_at_gmail.com> wrote:
> >
> > Dear list members,
> >
> > As a new GADGET4 user, I've encountered a yet unsolved problem, while
> testing GADGET4 at my university's new HPC cluster on multiple nodes,
> controlled by Slurm. I've seen a similar (newly posted) issue on this
> mailing list, but I can't confirm whether both issues have the same origin.
> > I'm trying to run the "colliding galaxies" example using the provided
> Config.sh and parameter file with OpenMPI 3.1.3. I've built G4 with gcc
> 8.3.0.
> >
> > Usually what happens is that the simulation starts running normally, but
> after some time (sometimes minutes, sometimes only after hours) it crashes
> with a segmentation fault. I also can't confirm whether this crash is
> consistent or not. I've tried to run GADGET4 on 4 nodes with 8 CPUs each
> most of the time, and it crashed similarly after approximately 4-5 hours
> after start.
> >
> > Extra info:
> > I've attached two separate files, containing the last iteration of two
> simulations before a crash. The file `log_tail.log` contains the usual
> crash, which I've encountered every single time. The `log_tail2.log`
> contains an "maybe useful anomaly", when GADGET4 seems to terminate because
> of some failure in it's shared memory handler.
> >
> > I would appreciate it very much if you could give any insight or advice
> on how to eliminate this problem! If you require any further information,
> please let me know.
> >
> > Best Regards,
> > Balázs
> > <log_tail2.log><log_tail.log>
> > -----------------------------------------------------------
> >
> > If you wish to unsubscribe from this mailing, send mail to
> > minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
> gadget-list
> > A web-archive of this mailing list is available here:
> > http://www.mpa-garching.mpg.de/gadget/gadget-list
>
>
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
>



Received on 2021-02-04 21:27:22

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:32 CET