Issue with Space Available

From: <dylan.chosson_at_edu.univ-fcomte.fr>
Date: Thu, 18 Mar 2021 15:25:42 +0100 (CET)

Dear all,

I am a new user of GADGET-4.
I have created an initial condition file using N-GenIC for 256^3 particles (dark matter only). But when I execute the GADGET code with
"#!/bin/bash
#SBATCH --time=24:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=45
#SBATCH --job-name=gadget-4_large_scale
#SBATCH --output=gadget_output.txt

echo
echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running on $SLURM_NPROCS processors."
echo "Current working directory is `pwd`"
echo

mpiexec -np $SLURM_NPROCS ./Gadget4 param.txt", I got the following error:

"[compuphys-calc:01028] *** Process received signal ***
[compuphys-calc:01028] Signal: Segmentation fault (11)
[compuphys-calc:01028] Signal code: Address not mapped (1)
[compuphys-calc:01028] Failing at address: (nil)
--------------------------------------------------------------------------
It appears as if there is not enough space for /tmp/openmpi-sessions-2983_at_compuphys-calc_0/57904/1/0/shared_window_5.compuphys-calc (the shared-memory backing
file). It is likely that your MPI job will now either abort or experience
performance degradation.

Local host: compuphys-calc
Space Requested: 61341885832 B
Space Available: 7678644224 B
--------------------------------------------------------------------------
[compuphys-calc:01028] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7fa64a1b4980]
[compuphys-calc:01028] [ 1] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_osc_sm.so(ompi_osc_sm_free+0x10c)[0x7fa62dc17abc]
[compuphys-calc:01028] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_osc_sm.so(+0x2e6f)[0x7fa62dc17e6f]
[compuphys-calc:01028] [ 3] /usr/lib/x86_64-linux-gnu/libmpi.so.20(ompi_win_allocate_shared+0x9b)[0x7fa64ad49b9b]
[compuphys-calc:01028] [ 4] /usr/lib/x86_64-linux-gnu/libmpi.so.20(PMPI_Win_allocate_shared+0xd6)[0x7fa64ad7e826]
[compuphys-calc:01028] [ 5] ./Gadget4(+0x31d3d)[0x5594c2b9ad3d]
[compuphys-calc:01028] [ 6] ./Gadget4(+0x1e94a)[0x5594c2b8794a]
[compuphys-calc:01028] [ 7] ./Gadget4(+0x1cd63)[0x5594c2b85d63]
[compuphys-calc:01028] [ 8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fa649dd2bf7]
[compuphys-calc:01028] [ 9] ./Gadget4(+0x1df9a)[0x5594c2b86f9a]
[compuphys-calc:01028] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 1028 on node compuphys-calc exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[compuphys-calc:01012] 44 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[compuphys-calc:01012] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages"


I use a university server dedicated to 2nd year Masters students with the following information:


Running on hosts: compuphys-calc
Running on 1 nodes.
Running on 45 processors.
Current working directory is /home/dchosso/Sem_10/Gadget-4/gadget4/my_sim/ics256

--------------------------------------------------------------------------
[[57904,1],34]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
Host: compuphys-calc

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
Shared memory islands host a minimum of 45 and a maximum of 45 MPI ranks.

___ __ ____ ___ ____ ____ __
/ __) /__\ ( _ \ / __)( ___)(_ _)___ /. |
( (_-. /(__)\ )(_) )( (_-. )__) )( (___)(_ _)
\___/(__)(__)(____/ \___/(____) (__) (_)

This is Gadget, version 4.0.
Git commit 01e6b1567c93fe1cfaffd499aa55151db2ed4208, Tue Mar 2 13:22:03 2021 +0100

Code was compiled with the following compiler and flags:
mpicxx -std=c++11 -ggdb -O3 -march=native -Wall -Wno-format-security -I/home/dchosso/Sem_10/hdf5-1.8.22/hdf5/include -I/home/dchosso/Sem_10/gsl-2.6/include -I/home/dchosso/Sem_10/fftw-3.3.9/include -Imy_sim/ics256/build -Isrc


Code was compiled with the following settings:
ASMTH=2.0
DOUBLEPRECISION=2
GADGET2_HEADER
LEAN
NSOFTCLASSES=1
NTYPES=2
PERIODIC
PMGRID=256
POSITIONS_IN_32BIT
POWERSPEC_ON_OUTPUT
RANDOMIZE_DOMAINCENTER
SELFGRAVITY
TREEPM_NOTIMESPLIT

Running on 45 MPI tasks.

BEGRUN: Size of particle structure 56 [bytes]
BEGRUN: Size of sph particle structure 96 [bytes]
BEGRUN: Size of gravity tree node 72 [bytes]
BEGRUN: Size of neighbour tree node 112 [bytes]
BEGRUN: Size of subfind auxiliary data 36 [bytes]

-------------------------------------------------------------------------------------------------------------------------
AvailMem: Largest = 61581.14 Mb (on task= 0), Smallest = 61581.14 Mb (on task= 0), Average = 61581.14 Mb
Total Mem: Largest = 63898.89 Mb (on task= 0), Smallest = 63898.89 Mb (on task= 0), Average = 63898.89 Mb
Committed_AS: Largest = 2317.74 Mb (on task= 0), Smallest = 2317.74 Mb (on task= 0), Average = 2317.74 Mb
SwapTotal: Largest = 8192.00 Mb (on task= 0), Smallest = 8192.00 Mb (on task= 0), Average = 8192.00 Mb
SwapFree: Largest = 8054.43 Mb (on task= 0), Smallest = 8054.43 Mb (on task= 0), Average = 8054.43 Mb
AllocMem: Largest = 2317.74 Mb (on task= 0), Smallest = 2317.74 Mb (on task= 0), Average = 2317.74 Mb
avail /dev/shm: Largest = 60703.95 Mb (on task= 0), Smallest = 60703.95 Mb (on task= 0), Average = 60703.95 Mb
-------------------------------------------------------------------------------------------------------------------------

Task=0 has the maximum commited memory and is host: compuphys-calc
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Obtaining parameters from file 'param.txt':

InitCondFile /home/dchosso/Sem_10/Gadget-4/ICs/ics256/ics256
OutputDir /home/dchosso/Sem_10/Gadget-4/gadget4/my_sim/ics256/output
SnapshotFileBase snapshot
OutputListFilename outputs_lcdm_gas.txt
ICFormat 1
SnapFormat 2
TimeLimitCPU 86400
CpuTimeBetRestartFile 7200
MaxMemSize 1300
TimeBegin 0.0909091
TimeMax 1
ComovingIntegrationOn 1
Omega0 0.3
OmegaLambda 0.7
OmegaBaryon 0.04
HubbleParam 0.7
Hubble 0.1
BoxSize 50000
OutputListOn 0
TimeBetSnapshot 1.06278
TimeOfFirstSnapshot 0.95
TimeBetStatistics 0.05
NumFilesPerSnapshot 1
MaxFilesWithConcurrentIO 1
ErrTolIntAccuracy 0.01
CourantFac 0.3
MaxSizeTimestep 0.025
MinSizeTimestep 0
TypeOfOpeningCriterion 1
ErrTolTheta 0.75
ErrTolThetaMax 1
ErrTolForceAcc 0.0025
TopNodeFactor 2.5
ActivePartFracForNewDomainDecomp 0.01
ActivePartFracForPMinsteadOfEwald 0.05
DesNumNgb 64
MaxNumNgbDeviation 1
UnitLength_in_cm 3.08568e+21
UnitMass_in_g 1.989e+43
UnitVelocity_in_cm_per_s 100000
GravityConstantInternal 0
SofteningComovingClass0 0.01
SofteningMaxPhysClass0 0.01
SofteningClassOfPartType0 0

As you can see, there is 1 node and 45 cores available.

Changing MaxMemSize to a lower value (e.g. 600) only changes "Space Requested".
My question is: why Gadget-4 only seems to see ~7.6Gb available when there is ~60Gb.

I have already checked Tiago's post on the topic "Not enough memory" and contacted the system administrators about the "half memory of machine" and they have already used the command " mount –o remount,size=95% /dev/shm ".

I would be very grateful if some one could help me in this issue.

With best regards,
Dylan
Received on 2021-03-18 15:25:44

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:32 CET