Re: Segmentation Fault on DMO runs on power9

From: Tiago Castro <tiagobscastro_at_gmail.com>
Date: Thu, 4 Mar 2021 16:21:59 +0100

The problem was with some libraries, indeed. Thanks Volker again for the
support!
*Tiago Castro* Post Doc, Department of Physics / UNITS / OATS
Phone: *(* <%28+39%29%20327%20498%200157>*+39 040 3199 120) *
<%28+39%29%20327%20498%200157>
Mobile: *(* <%28+39%29%20327%20498%200157>*+39 388 794 1562) *
<%28+39%29%20327%20498%200157>
Email: *tiagobscastro_at_gmail.com* <tiagobscastro_at_gmail.com>
Website: *tiagobscastro.com <http://tiagobscastro.com>*
<http://sites.if.ufrj.br/castro/en>
Skype: *tiagobscastro* <https://webapp.wisestamp.com/#>
Address:
*Osservatorio Astronomico di Trieste / Villa BazzoniVia Bazzoni, *
*2, 34143 Trieste TS* [image: photo]
<http://ws-promos.appspot.com/r?rdata=eyJydXJsIjogImh0dHA6Ly93d3cud2lzZXN0YW1wLmNvbS9lbWFpbC1pbnN0YWxsP3dzX25jaWQ9NjcyMjk0MDA4JnV0bV9zb3VyY2U9ZXh0ZW5zaW9uJnV0bV9tZWRpdW09ZW1haWwmdXRtX2NhbXBhaWduPXByb21vXzU3MzI1Njg1NDg3Njk3OTIiLCAiZSI6ICI1NzMyNTY4NTQ4NzY5NzkyIn0=&u=754281802009791>


Em qui., 4 de mar. de 2021 às 12:55, Volker Springel <
vspringel_at_mpa-garching.mpg.de> escreveu:

>
> Hm, could either be that this stack trace doesn't work on the power9
> platform at all, or that you need to use the GNU compiler in case you are
> not doing this already. Other than that I don't know.
>
> Volker
>
> > On 4. Mar 2021, at 11:30, Tiago Castro <tiagobscastro_at_gmail.com> wrote:
> >
> > Thanks, Volker. I have added the macros to define_extra and after
> running make it returns me the error below. Should I link anything else for
> correct compiling?
> >
> > /usr/bin/ld: build/system/backward.o: undefined reference to symbol
> 'dladdr_at__at_GLIBC_2.17'
> > //usr/lib64/libdl.so.2: error adding symbols: DSO missing from command
> line
> > collect2: error: ld returned 1 exit status
> > make: *** [Gadget4] Error 1
> > Tiago Castro Post Doc, Department of Physics / UNITS / OATS
> > Phone: (+39 040 3199 120)
> > Mobile: (+39 388 794 1562)
> > Email: tiagobscastro_at_gmail.com
> > Website: tiagobscastro.com
> > Skype: tiagobscastro
> > Address: Osservatorio Astronomico di Trieste / Villa Bazzoni
> > Via Bazzoni, 2, 34143 Trieste TS
> >
> >
> >
> >
> > Em qua., 3 de mar. de 2021 às 14:10, Volker Springel <
> vspringel_at_mpa-garching.mpg.de> escreveu:
> >
> > Hi Tiago,
> >
> > > On 3. Mar 2021, at 12:32, Tiago Castro <tiagobscastro_at_gmail.com>
> wrote:
> > >
> > > Many thanks, Volker.
> > >
> > > Hm, it possibly is a shared memory access problem given the place
> where this happens. Does the code run on a single node? Which MPI library
> is this? Certainly a buggy MPI-3 support is a primary suspect for this.
> It's also peculiar that the machine allows only 40% of the physical memory
> to be allocated as shared memory... (this is not good).
> > >
> > > The code did not run (crashed on the same part) on a single node. The
> MPI library is the one from IBM (I am running it on M100 cluster).
> >
> > Ok, in principle this should be IBM's Spectrum MPI library, which is
> closely related to OpenMPI. However, on Marconi100, you should be able to
> use GNU/OpenMPI as an alternative by changing to the corresponding modules.
> At least on Intel processors, OpenMPI works well for Gadget4.
> >
> > >
> > > You can try to activate DEBUG to see whether this gives a core file
> for the crash. This would allow to locate the line where this happens by
> loading the core-file with gdb.
> > >
> > > I asked support to run this, I have not used gdb on a mpi and batched
> jobs before. Get back to you once I manage to run this.
> > >
> > > Another possibility would be to add the attached stack-tracing class
> to the compiled files for Gagdet4. This will activate a signal handler and
> - if you are moderately lucky - print an informative stack-trace when the
> crash happens.
> > >
> > > I apologize for my ignorance, but I did not understand how to
> implement this.
> > >
> >
> > You only need to move backward.cc/backward.h to a source directory
> (e.g. src/system), and include them in the makefile of Gadget4, like
> > OBJS += system/pinning.o system/system.o system/backward.o
> > INCL += system/system.h system/pinning.h system/backward.h
> > That's all, the constructor of the class will be called automatically on
> start-up without needing to modify any of the original code.
> >
> > Regards,
> > Volker
> >
> >
> >
> >
> >
> > > Many thanks!
> > > Tiago Castro Post Doc, Department of Physics / UNITS / OATS
> > > Phone: (+39 040 3199 120)
> > > Mobile: (+39 388 794 1562)
> > > Email: tiagobscastro_at_gmail.com
> > > Website: tiagobscastro.com
> > > Skype: tiagobscastro
> > > Address: Osservatorio Astronomico di Trieste / Villa Bazzoni
> > > Via Bazzoni, 2, 34143 Trieste TS
> > >
> > >
> > >
> > >
> > > Em qui., 25 de fev. de 2021 às 15:59, Volker Springel <
> vspringel_at_mpa-garching.mpg.de> escreveu:
> > > Hi Tiago,
> > >
> > > Hm, it possibly is a shared memory access problem given the place
> where this happens. Does the code run on a single node? Which MPI library
> is this? Certainly a buggy MPI-3 support is a primary suspect for this.
> It's also peculiar that the machine allows only 40% of the physical memory
> to be allocated as shared memory... (this is not good).
> > >
> > > You can try to activate DEBUG to see whether this gives a core file
> for the crash. This would allow to locate the line where this happens by
> loading the core-file with gdb.
> > >
> > > Another possibility would be to add the attached stack-tracing class
> to the compiled files for Gagdet4. This will activate a signal handler and
> - if you are moderately lucky - print an informative stack-trace when the
> crash happens.
> > >
> > > Regards,
> > > Volker
> > >
> > >
> > >
> > >
> > > > On 25. Feb 2021, at 15:18, Tiago Castro <tiagobscastro_at_gmail.com>
> wrote:
> > > >
> > > > Dear list,
> > > >
> > > > I have tried to run g4 on a power9 cluster, and right after the IC
> creation and during the first step the code returns me segmentation fault .
> Any suggestions of what I am doing wrong?
> > > >
> > > > Many thanks for any help you can provide.
> > > > Regards,
> > > > T.
> > > > <param.std.txt><Config.sh><slurm-2608670.out>
> > > > -----------------------------------------------------------
> > > >
> > > > If you wish to unsubscribe from this mailing, send mail to
> > > > minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
> gadget-list
> > > > A web-archive of this mailing list is available here:
> > > > http://www.mpa-garching.mpg.de/gadget/gadget-list
> > >
> > >
> > > -----------------------------------------------------------
> > >
> > > If you wish to unsubscribe from this mailing, send mail to
> > > minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
> gadget-list
> > > A web-archive of this mailing list is available here:
> > > http://www.mpa-garching.mpg.de/gadget/gadget-list
> >
> >
> >
> >
> > -----------------------------------------------------------
> >
> > If you wish to unsubscribe from this mailing, send mail to
> > minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
> gadget-list
> > A web-archive of this mailing list is available here:
> > http://www.mpa-garching.mpg.de/gadget/gadget-list
> >
> > -----------------------------------------------------------
> >
> > If you wish to unsubscribe from this mailing, send mail to
> > minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
> gadget-list
> > A web-archive of this mailing list is available here:
> > http://www.mpa-garching.mpg.de/gadget/gadget-list
>
>
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
>
Received on 2021-03-04 16:22:23

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:32 CET