Re: Segmentation Fault on DMO runs on power9

From: Tiago Castro <tiagobscastro_at_gmail.com>
Date: Wed, 3 Mar 2021 12:32:18 +0100

Many thanks, Volker.

Hm, it possibly is a shared memory access problem given the place where
> this happens. Does the code run on a single node? Which MPI library is
> this? Certainly a buggy MPI-3 support is a primary suspect for this. It's
> also peculiar that the machine allows only 40% of the physical memory to be
> allocated as shared memory... (this is not good).
>

The code did not run (crashed on the same part) on a single node. The MPI
library is the one from IBM (I am running it on M100 cluster).

You can try to activate DEBUG to see whether this gives a core file for the
> crash. This would allow to locate the line where this happens by loading
> the core-file with gdb.
>

I asked support to run this, I have not used gdb on a mpi and batched jobs
before. Get back to you once I manage to run this.

Another possibility would be to add the attached stack-tracing class to the
> compiled files for Gagdet4. This will activate a signal handler and - if
> you are moderately lucky - print an informative stack-trace when the crash
> happens.
>

I apologize for my ignorance, but I did not understand how to implement
this.

Many thanks!
*Tiago Castro* Post Doc, Department of Physics / UNITS / OATS
Phone: *(* <%28+39%29%20327%20498%200157>*+39 040 3199 120) *
<%28+39%29%20327%20498%200157>
Mobile: *(* <%28+39%29%20327%20498%200157>*+39 388 794 1562) *
<%28+39%29%20327%20498%200157>
Email: *tiagobscastro_at_gmail.com* <tiagobscastro_at_gmail.com>
Website: *tiagobscastro.com <http://tiagobscastro.com>*
<http://sites.if.ufrj.br/castro/en>
Skype: *tiagobscastro* <https://webapp.wisestamp.com/#>
Address:
*Osservatorio Astronomico di Trieste / Villa BazzoniVia Bazzoni, *
*2, 34143 Trieste TS* [image: photo]
<http://ws-promos.appspot.com/r?rdata=eyJydXJsIjogImh0dHA6Ly93d3cud2lzZXN0YW1wLmNvbS9lbWFpbC1pbnN0YWxsP3dzX25jaWQ9NjcyMjk0MDA4JnV0bV9zb3VyY2U9ZXh0ZW5zaW9uJnV0bV9tZWRpdW09ZW1haWwmdXRtX2NhbXBhaWduPXByb21vXzU3MzI1Njg1NDg3Njk3OTIiLCAiZSI6ICI1NzMyNTY4NTQ4NzY5NzkyIn0=&u=754281802009791>


Em qui., 25 de fev. de 2021 às 15:59, Volker Springel <
vspringel_at_mpa-garching.mpg.de> escreveu:

> Hi Tiago,
>
> Hm, it possibly is a shared memory access problem given the place where
> this happens. Does the code run on a single node? Which MPI library is
> this? Certainly a buggy MPI-3 support is a primary suspect for this. It's
> also peculiar that the machine allows only 40% of the physical memory to be
> allocated as shared memory... (this is not good).
>
> You can try to activate DEBUG to see whether this gives a core file for
> the crash. This would allow to locate the line where this happens by
> loading the core-file with gdb.
>
> Another possibility would be to add the attached stack-tracing class to
> the compiled files for Gagdet4. This will activate a signal handler and -
> if you are moderately lucky - print an informative stack-trace when the
> crash happens.
>
> Regards,
> Volker
>
>
>
>
> > On 25. Feb 2021, at 15:18, Tiago Castro <tiagobscastro_at_gmail.com> wrote:
> >
> > Dear list,
> >
> > I have tried to run g4 on a power9 cluster, and right after the IC
> creation and during the first step the code returns me segmentation fault .
> Any suggestions of what I am doing wrong?
> >
> > Many thanks for any help you can provide.
> > Regards,
> > T.
> > <param.std.txt><Config.sh><slurm-2608670.out>
> > -----------------------------------------------------------
> >
> > If you wish to unsubscribe from this mailing, send mail to
> > minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
> gadget-list
> > A web-archive of this mailing list is available here:
> > http://www.mpa-garching.mpg.de/gadget/gadget-list
>
>
Received on 2021-03-03 12:32:39

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:32 CET