Re: Segmentation fault

From: Michele Trenti <trenti_at_stsci.edu>
Date: Fri, 8 Dec 2006 19:06:04 -0500 (EST)

Hello,

I have also just experienced segmentation faults on Gadget2 during PM
calculation (backtraced at pm_periodic.c:271), while running on the Xeon
cluster at NCSA (see debugging information below). The segmentation fault
happens only for "large" runs, e.g. 512^3, while with small N, like 64^3
all works nicely.

I was wondering if someone has experience using Gadget2 on the same
system (or on another Teragrid system, I just started exploring
NCSA clusters, but my allocation is Teragrid wide) and is willing to share
his/her expertize on the compilation. Maybe my Makefile (reported at the
end) is not completely correct?

Output during the run:
---------------------------------------------------------
[trenti_at_tund ~/gadget_test]$ more gad_512_512.843502.o

This is Gadget, version `2.0'.

Running on 16 processors.

found 15 times in output-list.

Allocated 100 MByte communication buffer per processor.

Communication buffer has room for 2383126 particles in gravity computation
Communication buffer has room for 819200 particles in density computation
Communication buffer has room for 655360 particles in hydro computation
Communication buffer has room for 609636 particles in domain decomposition


Hubble (internal units) = 0.1
G (internal units) = 43007.1
UnitMass_in_g = 1.989e+43
UnitTime_in_s = 3.08568e+16
UnitVelocity_in_cm_per_s = 100000
UnitDensity_in_cgs = 6.76991e-22
UnitEnergy_in_cgs = 1.989e+53

Task=0 FFT-Slabs=32
Task=1 FFT-Slabs=32
Task=2 FFT-Slabs=32
Task=3 FFT-Slabs=32
Task=4 FFT-Slabs=32
Task=5 FFT-Slabs=32
Task=6 FFT-Slabs=32
Task=7 FFT-Slabs=32
Task=8 FFT-Slabs=32
Task=9 FFT-Slabs=32
Task=10 FFT-Slabs=32
Task=11 FFT-Slabs=32
Task=12 FFT-Slabs=32
Task=13 FFT-Slabs=32
Task=14 FFT-Slabs=32
Task=15 FFT-Slabs=32

Allocated 896 MByte for particle storage. 80


reading file `./ic512_512_gic' on task=0 (contains 134217728 particles.)
distributing this file to tasks 0-15
Type 0 (gas): 0 (tot= 0000000000) masstab=0
Type 1 (halo): 134217728 (tot= 0134217728) masstab=7.2163
Type 2 (disk): 0 (tot= 0000000000) masstab=0
Type 3 (bulge): 0 (tot= 0000000000) masstab=0
Type 4 (stars): 0 (tot= 0000000000) masstab=0
Type 5 (bndry): 0 (tot= 0000000000) masstab=0

reading done.
Total number of particles : 0134217728

allocated 0.0762939 Mbyte for ngb search.

Allocated 627.963 MByte for BH-tree. 64

domain decomposition...
NTopleaves= 512
work-load balance=1.00646 memory-balance=1.00646
exchange of 0117510361 particles
exchange of 0057421241 particles
exchange of 0012167098 particles
exchange of 0003632194 particles
domain decomposition done.
begin Peano-Hilbert order...
Peano-Hilbert done.
Begin Ngb-tree construction.
Ngb-Tree contruction finished

Setting next time for snapshot file to Time_next= 0.0322581


Begin Step 0, Time: 0.02, Redshift: 49, Systemstep: 0, Dloga: 0
domain decomposition...
NTopleaves= 512
work-load balance=1.00646 memory-balance=1.00646
domain decomposition done.
begin Peano-Hilbert order...
Peano-Hilbert done.
Start force computation...
Starting periodic PM calculation.

Allocated 102.556 MByte for FFT data.

done PM.
Tree construction.
Tree construction done.
Begin tree force.
tree is done.
Begin tree force.
tree is done.
force computation done.
type=1 dmean=1000 asmth=1250 minmass=7.2163 a=0.02 sqrt(<p^2>)=1.80051
dlogmax=0.801017
displacement time constraint: 0.025 (0.025)

Begin Step 1, Time: 0.0202531, Redshift: 48.3752, Systemstep: 0.000253062,
Dloga: 0.0125737
domain decomposition...
NTopleaves= 512
work-load balance=1.02818 memory-balance=1.03561
exchange of 0001322377 particles
domain decomposition done.
begin Peano-Hilbert order...
Peano-Hilbert done.
Start force computation...
Starting periodic PM calculation.
Segmentation fault (core dumped)
User defined signal 2
[trenti_at_tund ~/gadget_test]$
-----------------------------------------------------


And this is the gdb analysis of the core file:
-------------------------------------------------------
[trenti_at_tund debug]$ gdb ./Gadget2DEBUG core.14409
GNU gdb Red Hat Linux (5.3post-0.20021129.18rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "i386-redhat-linux-gnu"...
Core was generated by `Gadget2DEBUG cluster.param'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from
/usr/apps/math/gsl/gsl-1.6/intel90/lib/libgsl.so.0...done.
Loaded symbols for /usr/apps/math/gsl/gsl-1.6/intel90/lib/libgsl.so.0
Reading symbols from
/usr/apps/math/gsl/gsl-1.6/intel90/lib/libgslcblas.so.0...done.
Loaded symbols for /usr/apps/math/gsl/gsl-1.6/intel90/lib/libgslcblas.so.0
Reading symbols from /usr/local/intel/9.0.026/lib/libimf.so...done.
Loaded symbols for /usr/local/intel/9.0.026/lib/libimf.so
Reading symbols from /lib/i686/libm.so.6...done.
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /usr/local/cmpipro-2.1.0-1tgm2/lib/libcmpi.so...done.
Loaded symbols for /usr/local/cmpipro-2.1.0-1tgm2/lib/libcmpi.so
Reading symbols from /lib/i686/libpthread.so.0...done.
Loaded symbols for /lib/i686/libpthread.so.0
Reading symbols from /opt/gm/lib/libgm.so.0...done.
Loaded symbols for /opt/gm/lib/libgm.so.0
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
#0 0x08062562 in pmforce_periodic () at pm_periodic.c:271
271 workspace[(slab_x * dimy + slab_y) * dimz + slab_z] +=
P[i].Mass * (1.0 - dx) * (1.0 - dy) * (1.0 - dz);
(gdb) backtrace
#0 0x08062562 in pmforce_periodic () at pm_periodic.c:271
#1 0xbfffcef0 in ?? ()
Cannot access memory at address 0x2
(gdb)

-----------------------------------------------

And finally this is what I use as Makefile:
--------------------------------------------
[trenti_at_tund source]$ more Makefile

#----------------------------------------------------------------------
# From the list below, please activate/deactivate the options that
# apply to your run. If you modify any of these options, make sure
# that you recompile the whole code by typing "make clean; make".
#
# Look at end of file for a brief guide to the compile-time options.
#----------------------------------------------------------------------


#--------------------------------------- Basic operation mode of code
OPT += -DPERIODIC
#OPT += -DUNEQUALSOFTENINGS


#--------------------------------------- Things that are always
recommended
OPT += -DPEANOHILBERT
OPT += -DWALLCLOCK


#--------------------------------------- TreePM Options
OPT += -DPMGRID=512
#OPT += -DPLACEHIGHRESREGION=3
#OPT += -DENLARGEREGION=1.2
#OPT += -DASMTH=1.25
#OPT += -DRCUT=4.5


#--------------------------------------- Single/Double Precision
#OPT += -DDOUBLEPRECISION
#OPT += -DDOUBLEPRECISION_FFTW


#--------------------------------------- Time integration options
OPT += -DSYNCHRONIZATION
#OPT += -DFLEXSTEPS
#OPT += -DPSEUDOSYMMETRIC
#OPT += -DNOSTOP_WHEN_BELOW_MINTIMESTEP
#OPT += -DNOPMSTEPADJUSTMENT


#--------------------------------------- Output options
#OPT += -DHAVE_HDF5
#OPT += -DOUTPUTPOTENTIAL
#OPT += -DOUTPUTACCELERATION
#OPT += -DOUTPUTCHANGEOFENTROPY
#OPT += -DOUTPUTTIMESTEP


#--------------------------------------- Things for special behaviour
#OPT += -DNOGRAVITY
#OPT += -DNOTREERND
#OPT += -DNOTYPEPREFIX_FFTW
#OPT += -DLONG_X=60
#OPT += -DLONG_Y=5
#OPT += -DLONG_Z=0.2
#OPT += -DTWODIMS
#OPT += -DSPH_BND_PARTICLES
#OPT += -DNOVISCOSITYLIMITER
#OPT += -DCOMPUTE_POTENTIAL_ENERGY
#OPT += -DLONGIDS
#OPT += -DISOTHERMAL
#OPT += -DSELECTIVE_NO_GRAVITY=2+4+8+16

#--------------------------------------- Testing and Debugging options
#OPT += -DFORCETEST=0.1


#--------------------------------------- Glass making
#OPT += -DMAKEGLASS=262144


#----------------------------------------------------------------------
# Here, select compile environment for the target machine. This may need
# adjustment, depending on your local system. Follow the examples to add
# additional target platforms, and to get things properly compiled.
#----------------------------------------------------------------------

#--------------------------------------- Select some defaults

CC = cmpicc # sets the C-compiler
OPTIMIZE = -O3 -Wall # sets optimization and warning flags
MPICHLIB = -lmpich


#--------------------------------------- Select target computer

#SYSTYPE="UDF"
SYSTYPE="XEON"
#SYSTYPE="Regatta"
#SYSTYPE="RZG_LinuxCluster"
#SYSTYPE="RZG_LinuxCluster-gcc"
#SYSTYPE="Opteron"

#--------------------------------------- Adjust settings for target
computer


ifeq ($(SYSTYPE),"XEON")
CC = cmpicc
OPTIMIZE = -O3 -Wall -g
GSL_INCL = -I/${GSL_HOME}/include
GSL_LIBS = -L/${GSL_HOME}/lib
FFTW_INCL= -I/${FFTW_HOME}/include
FFTW_LIBS= -L/${FFTW_HOME}/lib
MPICHLIB =
HDF5INCL =
HDF5LIB =
endif

....

---------------------------------------


Thanks a lot for your help,

Michele

Michele Trenti
Space Telescope Science Institute
3700 San Martin Drive Phone: +1 410 338 4987
Baltimore MD 21218 U.S. Fax: +1 410 338 4767


" We shall not cease from exploration
   And the end of all our exploring
   Will be to arrive where we started
   And know the place for the first time. "

                                      T. S. Eliot


On Sat, 2 Dec 2006, Volker Springel wrote:

>
> On Wednesday 29 November 2006 22:44, Craig Rudick wrote:
>> Hi,
>>
>> We have been attempting to switch from Gadget1 to Gadget2, but have
>> been running into the problem that Gadget2 produces a segmentation
>> fault and dies when we try to run the example initial conditions. The
>> segmentation fault almost always occurrs during or immediately
>> following the first domain decomposition, with output that reads:
>>
>> domain decomposition...
>> Segmentation fault
>>
>> We have tried compiling using both the Portland Group and Intel
>> compilers and see the same behavior.
>>
>> The really frustrating part is that we only get this error depending on
>> both on the initial conditions used, and the number or processors on
>> which it is run. That is, the 'cluster' example IC runs perfectly on
>> up to 16 nodes, however all of the other example ICs will run only on
>> four or fewer nodes (2 processors per node).
>>
>> Has anyone seen similar errors with Gadget2 or have any ideas on what
>> might be the solution to this error?
>>
>
> Hi Craig,
>
> This is strange. I can't reproduce this problem on any of the machines I
> have access to (which are a few), and I also haven't heard from anyone
> else experiencing this error. It could be related to the set-up of your
> cluster and/or the compiler/MPI library you are using. I'd suggest to
> compile with the gcc-compiler (with -g) and look at the core file that's
> produced by the crash with a debugger. If the crash is reproducible, this
> would tell you if it is caused by gadget2, and where this happens.
>
> Volker
>
>
>
>
>> Thanks,
>> Craig Rudick
>> Case Western Reserve University
>>
>>
>>
>>
>> -----------------------------------------------------------
>>
>> If you wish to unsubscribe from this mailing, send mail to
>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
>> gadget-list A web-archive of this mailing list is available here:
>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>
>
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
>
Received on 2006-12-09 01:06:42

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:30 CET