Re: Segmentation fault

From: Volker Springel <volker_at_MPA-Garching.MPG.DE>
Date: Sat, 9 Dec 2006 07:52:11 +0100

Hi Michele,

It could be that your FFTW library was compiled with default settings
(which makes it double-precision), while with the gadget-Makefile you
used, you will call it as single-precision library. If that's the
problem, then setting DOUBLEPRECISION_FFTW should fix it.

Volker

On Saturday 09 December 2006 01:06, Michele Trenti wrote:
> Hello,
>
> I have also just experienced segmentation faults on Gadget2 during PM
> calculation (backtraced at pm_periodic.c:271), while running on the
> Xeon cluster at NCSA (see debugging information below). The
> segmentation fault happens only for "large" runs, e.g. 512^3, while
> with small N, like 64^3 all works nicely.
>
> I was wondering if someone has experience using Gadget2 on the same
> system (or on another Teragrid system, I just started exploring
> NCSA clusters, but my allocation is Teragrid wide) and is willing to
> share his/her expertize on the compilation. Maybe my Makefile
> (reported at the end) is not completely correct?
>
> Output during the run:
> ---------------------------------------------------------
> [trenti_at_tund ~/gadget_test]$ more gad_512_512.843502.o
>
> This is Gadget, version `2.0'.
>
> Running on 16 processors.
>
> found 15 times in output-list.
>
> Allocated 100 MByte communication buffer per processor.
>
> Communication buffer has room for 2383126 particles in gravity
> computation Communication buffer has room for 819200 particles in
> density computation Communication buffer has room for 655360
> particles in hydro computation Communication buffer has room for
> 609636 particles in domain decomposition
>
>
> Hubble (internal units) = 0.1
> G (internal units) = 43007.1
> UnitMass_in_g = 1.989e+43
> UnitTime_in_s = 3.08568e+16
> UnitVelocity_in_cm_per_s = 100000
> UnitDensity_in_cgs = 6.76991e-22
> UnitEnergy_in_cgs = 1.989e+53
>
> Task=0 FFT-Slabs=32
> Task=1 FFT-Slabs=32
> Task=2 FFT-Slabs=32
> Task=3 FFT-Slabs=32
> Task=4 FFT-Slabs=32
> Task=5 FFT-Slabs=32
> Task=6 FFT-Slabs=32
> Task=7 FFT-Slabs=32
> Task=8 FFT-Slabs=32
> Task=9 FFT-Slabs=32
> Task=10 FFT-Slabs=32
> Task=11 FFT-Slabs=32
> Task=12 FFT-Slabs=32
> Task=13 FFT-Slabs=32
> Task=14 FFT-Slabs=32
> Task=15 FFT-Slabs=32
>
> Allocated 896 MByte for particle storage. 80
>
>
> reading file `./ic512_512_gic' on task=0 (contains 134217728
> particles.) distributing this file to tasks 0-15
> Type 0 (gas): 0 (tot= 0000000000) masstab=0
> Type 1 (halo): 134217728 (tot= 0134217728) masstab=7.2163
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
> reading done.
> Total number of particles : 0134217728
>
> allocated 0.0762939 Mbyte for ngb search.
>
> Allocated 627.963 MByte for BH-tree. 64
>
> domain decomposition...
> NTopleaves= 512
> work-load balance=1.00646 memory-balance=1.00646
> exchange of 0117510361 particles
> exchange of 0057421241 particles
> exchange of 0012167098 particles
> exchange of 0003632194 particles
> domain decomposition done.
> begin Peano-Hilbert order...
> Peano-Hilbert done.
> Begin Ngb-tree construction.
> Ngb-Tree contruction finished
>
> Setting next time for snapshot file to Time_next= 0.0322581
>
>
> Begin Step 0, Time: 0.02, Redshift: 49, Systemstep: 0, Dloga: 0
> domain decomposition...
> NTopleaves= 512
> work-load balance=1.00646 memory-balance=1.00646
> domain decomposition done.
> begin Peano-Hilbert order...
> Peano-Hilbert done.
> Start force computation...
> Starting periodic PM calculation.
>
> Allocated 102.556 MByte for FFT data.
>
> done PM.
> Tree construction.
> Tree construction done.
> Begin tree force.
> tree is done.
> Begin tree force.
> tree is done.
> force computation done.
> type=1 dmean=1000 asmth=1250 minmass=7.2163 a=0.02
> sqrt(<p^2>)=1.80051 dlogmax=0.801017
> displacement time constraint: 0.025 (0.025)
>
> Begin Step 1, Time: 0.0202531, Redshift: 48.3752, Systemstep:
> 0.000253062, Dloga: 0.0125737
> domain decomposition...
> NTopleaves= 512
> work-load balance=1.02818 memory-balance=1.03561
> exchange of 0001322377 particles
> domain decomposition done.
> begin Peano-Hilbert order...
> Peano-Hilbert done.
> Start force computation...
> Starting periodic PM calculation.
> Segmentation fault (core dumped)
> User defined signal 2
> [trenti_at_tund ~/gadget_test]$
> -----------------------------------------------------
>
>
> And this is the gdb analysis of the core file:
> -------------------------------------------------------
> [trenti_at_tund debug]$ gdb ./Gadget2DEBUG core.14409
> GNU gdb Red Hat Linux (5.3post-0.20021129.18rh)
> Copyright 2003 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and
> you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for
> details.
> This GDB was configured as "i386-redhat-linux-gnu"...
> Core was generated by `Gadget2DEBUG cluster.param'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from
> /usr/apps/math/gsl/gsl-1.6/intel90/lib/libgsl.so.0...done.
> Loaded symbols for /usr/apps/math/gsl/gsl-1.6/intel90/lib/libgsl.so.0
> Reading symbols from
> /usr/apps/math/gsl/gsl-1.6/intel90/lib/libgslcblas.so.0...done.
> Loaded symbols for
> /usr/apps/math/gsl/gsl-1.6/intel90/lib/libgslcblas.so.0 Reading
> symbols from /usr/local/intel/9.0.026/lib/libimf.so...done. Loaded
> symbols for /usr/local/intel/9.0.026/lib/libimf.so
> Reading symbols from /lib/i686/libm.so.6...done.
> Loaded symbols for /lib/i686/libm.so.6
> Reading symbols from
> /usr/local/cmpipro-2.1.0-1tgm2/lib/libcmpi.so...done. Loaded symbols
> for /usr/local/cmpipro-2.1.0-1tgm2/lib/libcmpi.so Reading symbols
> from /lib/i686/libpthread.so.0...done.
> Loaded symbols for /lib/i686/libpthread.so.0
> Reading symbols from /opt/gm/lib/libgm.so.0...done.
> Loaded symbols for /opt/gm/lib/libgm.so.0
> Reading symbols from /lib/libgcc_s.so.1...done.
> Loaded symbols for /lib/libgcc_s.so.1
> Reading symbols from /lib/i686/libc.so.6...done.
> Loaded symbols for /lib/i686/libc.so.6
> Reading symbols from /lib/libdl.so.2...done.
> Loaded symbols for /lib/libdl.so.2
> Reading symbols from /lib/ld-linux.so.2...done.
> Loaded symbols for /lib/ld-linux.so.2
> Reading symbols from /lib/libnss_files.so.2...done.
> Loaded symbols for /lib/libnss_files.so.2
> #0 0x08062562 in pmforce_periodic () at pm_periodic.c:271
> 271 workspace[(slab_x * dimy + slab_y) * dimz + slab_z] +=
> P[i].Mass * (1.0 - dx) * (1.0 - dy) * (1.0 - dz);
> (gdb) backtrace
> #0 0x08062562 in pmforce_periodic () at pm_periodic.c:271
> #1 0xbfffcef0 in ?? ()
> Cannot access memory at address 0x2
> (gdb)
>
> -----------------------------------------------
>
> And finally this is what I use as Makefile:
> --------------------------------------------
> [trenti_at_tund source]$ more Makefile
>
> #--------------------------------------------------------------------
>-- # From the list below, please activate/deactivate the options that
> # apply to your run. If you modify any of these options, make sure #
> that you recompile the whole code by typing "make clean; make". #
> # Look at end of file for a brief guide to the compile-time options.
> #--------------------------------------------------------------------
>--
>
>
> #--------------------------------------- Basic operation mode of code
> OPT += -DPERIODIC
> #OPT += -DUNEQUALSOFTENINGS
>
>
> #--------------------------------------- Things that are always
> recommended
> OPT += -DPEANOHILBERT
> OPT += -DWALLCLOCK
>
>
> #--------------------------------------- TreePM Options
> OPT += -DPMGRID=512
> #OPT += -DPLACEHIGHRESREGION=3
> #OPT += -DENLARGEREGION=1.2
> #OPT += -DASMTH=1.25
> #OPT += -DRCUT=4.5
>
>
> #--------------------------------------- Single/Double Precision
> #OPT += -DDOUBLEPRECISION
> #OPT += -DDOUBLEPRECISION_FFTW
>
>
> #--------------------------------------- Time integration options
> OPT += -DSYNCHRONIZATION
> #OPT += -DFLEXSTEPS
> #OPT += -DPSEUDOSYMMETRIC
> #OPT += -DNOSTOP_WHEN_BELOW_MINTIMESTEP
> #OPT += -DNOPMSTEPADJUSTMENT
>
>
> #--------------------------------------- Output options
> #OPT += -DHAVE_HDF5
> #OPT += -DOUTPUTPOTENTIAL
> #OPT += -DOUTPUTACCELERATION
> #OPT += -DOUTPUTCHANGEOFENTROPY
> #OPT += -DOUTPUTTIMESTEP
>
>
> #--------------------------------------- Things for special behaviour
> #OPT += -DNOGRAVITY
> #OPT += -DNOTREERND
> #OPT += -DNOTYPEPREFIX_FFTW
> #OPT += -DLONG_X=60
> #OPT += -DLONG_Y=5
> #OPT += -DLONG_Z=0.2
> #OPT += -DTWODIMS
> #OPT += -DSPH_BND_PARTICLES
> #OPT += -DNOVISCOSITYLIMITER
> #OPT += -DCOMPUTE_POTENTIAL_ENERGY
> #OPT += -DLONGIDS
> #OPT += -DISOTHERMAL
> #OPT += -DSELECTIVE_NO_GRAVITY=2+4+8+16
>
> #--------------------------------------- Testing and Debugging
> options #OPT += -DFORCETEST=0.1
>
>
> #--------------------------------------- Glass making
> #OPT += -DMAKEGLASS=262144
>
>
> #--------------------------------------------------------------------
>-- # Here, select compile environment for the target machine. This may
> need # adjustment, depending on your local system. Follow the
> examples to add # additional target platforms, and to get things
> properly compiled.
> #--------------------------------------------------------------------
>--
>
> #--------------------------------------- Select some defaults
>
> CC = cmpicc # sets the C-compiler
> OPTIMIZE = -O3 -Wall # sets optimization and warning flags
> MPICHLIB = -lmpich
>
>
> #--------------------------------------- Select target computer
>
> #SYSTYPE="UDF"
> SYSTYPE="XEON"
> #SYSTYPE="Regatta"
> #SYSTYPE="RZG_LinuxCluster"
> #SYSTYPE="RZG_LinuxCluster-gcc"
> #SYSTYPE="Opteron"
>
> #--------------------------------------- Adjust settings for target
> computer
>
>
> ifeq ($(SYSTYPE),"XEON")
> CC = cmpicc
> OPTIMIZE = -O3 -Wall -g
> GSL_INCL = -I/${GSL_HOME}/include
> GSL_LIBS = -L/${GSL_HOME}/lib
> FFTW_INCL= -I/${FFTW_HOME}/include
> FFTW_LIBS= -L/${FFTW_HOME}/lib
> MPICHLIB =
> HDF5INCL =
> HDF5LIB =
> endif
>
> ...
>
> ---------------------------------------
>
>
> Thanks a lot for your help,
>
> Michele
>
> Michele Trenti
> Space Telescope Science Institute
> 3700 San Martin Drive Phone: +1 410 338 4987
> Baltimore MD 21218 U.S. Fax: +1 410 338 4767
>
>
> " We shall not cease from exploration
> And the end of all our exploring
> Will be to arrive where we started
> And know the place for the first time. "
>
> T. S. Eliot
>
> On Sat, 2 Dec 2006, Volker Springel wrote:
> > On Wednesday 29 November 2006 22:44, Craig Rudick wrote:
> >> Hi,
> >>
> >> We have been attempting to switch from Gadget1 to Gadget2, but
> >> have been running into the problem that Gadget2 produces a
> >> segmentation fault and dies when we try to run the example initial
> >> conditions. The segmentation fault almost always occurrs during
> >> or immediately following the first domain decomposition, with
> >> output that reads:
> >>
> >> domain decomposition...
> >> Segmentation fault
> >>
> >> We have tried compiling using both the Portland Group and Intel
> >> compilers and see the same behavior.
> >>
> >> The really frustrating part is that we only get this error
> >> depending on both on the initial conditions used, and the number
> >> or processors on which it is run. That is, the 'cluster' example
> >> IC runs perfectly on up to 16 nodes, however all of the other
> >> example ICs will run only on four or fewer nodes (2 processors per
> >> node).
> >>
> >> Has anyone seen similar errors with Gadget2 or have any ideas on
> >> what might be the solution to this error?
> >
> > Hi Craig,
> >
> > This is strange. I can't reproduce this problem on any of the
> > machines I have access to (which are a few), and I also haven't
> > heard from anyone else experiencing this error. It could be related
> > to the set-up of your cluster and/or the compiler/MPI library you
> > are using. I'd suggest to compile with the gcc-compiler (with -g)
> > and look at the core file that's produced by the crash with a
> > debugger. If the crash is reproducible, this would tell you if it
> > is caused by gadget2, and where this happens.
> >
> > Volker
> >
> >> Thanks,
> >> Craig Rudick
> >> Case Western Reserve University
> >>
> >>
> >>
> >>
> >> -----------------------------------------------------------
> >>
> >> If you wish to unsubscribe from this mailing, send mail to
> >> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
> >> gadget-list A web-archive of this mailing list is available here:
> >> http://www.mpa-garching.mpg.de/gadget/gadget-list
> >
> > -----------------------------------------------------------
> >
> > If you wish to unsubscribe from this mailing, send mail to
> > minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
> > gadget-list A web-archive of this mailing list is available here:
> > http://www.mpa-garching.mpg.de/gadget/gadget-list
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
> gadget-list A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
Received on 2006-12-09 13:53:41

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:30 CET