Code Configuration
Many aspects of GADGET-4 are controlled with compile-time options rather than run-time options in the parameterfile. This is done in order to allow the generation of highly optimised binaries by the compiler, even when the underlying source code allows for many different ways to run the code. Unfortunately, this technique has the disadvantage that different simulations typically require different binary executables of GADGET-4, so "installing" GADGET-4 on a computer system is not possible, and the act of compiling the code is an integral part of working with the code. Keeping the executable at a single place is not recommended either, because if several simulations are run concurrently, this invokes the danger that a simulation is started or resumed with the wrong binary. Note that while GADGET-4 checks the plausibility of some of the most important code options, this is not done for all of them. Hence, to minimise the risk of using the wrong code for a simulation, it is highly recommended to produce a separate executable for each simulation that is run.
As a piece of advice, a good strategy for doing this in practice is to
create a separate directory for each simulation that is made, place a
copy of the whole simulation source code together with its makefile
into this directory, compile the code there and run it in this
directory as well, with the output directory specified as a
subdirectory of the simulation directory. In this way, the code and
its settings become a logical and integral part of the output
generated by the code. Everything belonging to a given simulation is
packaged up in one directory, and it becomes easy to reproduce what
was done even if considerable time should have passed, because the
precise version of the original code that was used and all produced
log files are readily available. An alternative is to have only a
single copy of the code, but then to use a separate directory for each
simulation that is done, including placing its configuration file
there and carrying out the compilation in this directory as well (by
passing the make DIR=
option to the build as well). Then at least
all object files and the executable are unambiguously associated with
a particular simulation run.
The easiest way to create the file Config.sh
needed for compilation
is to produce it as a copy of Template-Config.sh
, and then modify
it as needed. When created from Template-Config.sh
, the Config.sh
file contains a dummy list of all available compile-time code options,
with most of them commented out by default. To activate a certain
feature, the corresponding symbol should be commented in, and given
the desired value, if appropriate.
Important Note:
Whenever you change one of the options described below, a full
recompilation of the code may in general be necessary. For this
reason, the Config.sh
itself has been added to the list of
dependences of each source file in the makefile, such that a complete
recompilation should happen automatically when the configuration file
is changed and the command make
is given. Note that a manual
recompilation of the whole code can be enforced with the command make
clean
, which will erase all object files, followed by make
.
Most code options in Config.sh
are switches that toggle a certain
feature on/off. Some of the symbols also take on a value if set. The
following list shows these switches with fiducial example values where
appropriate.
Parallelization options
IMPOSE_PINNING
Ask the code to pin MPI processes to cores in an optimum fashion. This requires the hwloc library for detecting the processor topology. It will generally only work on Linux. Note that most modern MPI libraries can also be asked to arrange for the pinning via options to the MPI start-up command, or they do this per default anyhow.
IMPOSE_PINNING_OVERRIDE_MODE
In case the MPI start-up has already established a pinning, this is
normally detected and then IMPOSE_PINNING
does not do
anything. Overriding this pre-established pinning can be enforced with
this option.
EXPLICIT_VECTORIZATION
This enables a few compute kernel (currently in SPH only) to explicitly use AVX instructions through the use of the vectorclass C++ library.
PRESERVE_SHMEM_BINARY_INVARIANCE
This can be used to preserve the order in which partial results are added in case the parallel tree walks use shared memory to access tree branch data that has been imported by different processes on the same shared memory node. In this case, exact binary invariance of output can be retained upon reruns of the code.
SIMPLE_DOMAIN_AGGREGATION
This tries to very roughly restrict the domain decomposition to place adjacent domain pieces on the same shared memory node. This will then (in some cases significantly) reduce the number of imports of particles and nodes that need to be made, but at the price of a higher imbalance overall. Whether this is worth it depends strongly on the problem type. A better (forthcoming) solution will be to do the domain decomposition hierarchically right away, taking the outline of the shared-memory nodes into account from the outset.
Basic operation mode of code
PERIODIC
Set this option if you want to have periodic boundary conditions. In
this case, the BoxSize
parameter in the parameterfile becomes
relevant.
TWODIMS
This effectively switches off one spatial dimension in SPH, i.e. the code follows only 2D hydrodynamics either in the xy-, yz-, or xz-plane. One doesn't need to tell the code explicitly which of these planes are used, but all coordinates of the third dimension need to be exactly equal (usually set to zero).
ONEDIMS
Similarly to TWODIMS
, this effectively only allows one spatial
dimension in SPH, i.e. the code follows only 1D hydrodynamics either
in the x-, y-, or z-directions. One doesn't need to tell the code
explicitly which of these directions are used, but all coordinates of
the other dimensions should be set to zero.
LONG_X_BITS = 2
In case periodic boundary conditions are used (i.e. PERIODIC
is on),
one can stretch the x-dimension of the box relative to BoxSize
by
the factor 1 / 2^LONG_X_BITS
. A setting equal to 2 like in this
example, would hence mean that the boxsize in the x-direction will be
BoxSize/4
.
LONG_Y_BITS = 2
Similarly to the above, this switch can stretch the periodic box in the
y-direction by the factor 1/2^LONG_Y_BITS
relative to BoxSize.
LONG_Z_BITS = 1
Finally, this option implements a possible stretch in the z-direction, similar
to LONG_X
and LONG_Y
. Only a subset of the stretch factors or several/all
of them may be used. The LONG_X/Y/Z_BITS
values must be positive integers.
NTYPES = 6
Number of particle types that should be used by the code. If not set, a default value of 6 is adopted (which is the fixed value realised in GADGET-2/3). The different particle types are useful as a means to organize the simulation particles into different logical sets, which can be helpful for analysis (e.g., one may have a type for dark matter, one for stars, one for disk stars, etc.). The type 0 is reserved for gas particles, and to them SPH is applied. All other particle types are treated as collisionless particles which are treated on the same footing as far as gravity is concerned. For each of the types, one needs to specify a gravitational softening class in the parameterfile.
GADGET2_HEADER
This should be set if backwards compatibility of the snapshot file header format with GADGET-2/3 is required. Applies only to file formats 1 and 2. This can be useful, for example, to read in old initial conditions. Note that in this case NTYPES may not be larger than 6, and it should be at least as large as the number of types actually contained in the legacy initial conditions. Also, files cannot have more than 2^31 particles of any type in total in this case.
SECOND_ORDER_LPT_ICS
Processes the initial conditions before actual simulation start-up to add in second order Lagrangian perturbation theory corrections. This only works for specially constructed initial conditions created with Adrian Jenkin's IC code.
LEAN
This option is meant for special DM-only simulations that aim at high
savings in memory use. Only a uniform particle mass, a single particle
type, no hydrodynamics, and a single softening class are allowed. In
addition, one should use TREEPM_NOTIMESPLIT
, and refrain from using
double precision to obtain a very small memory footprint.
Gravity calculation
SELFGRAVITY
This needs to be switched on if self-gravity should be computed by the code. If it is off, one can still have hydrodynamics, and optionally a prescribed external gravitational field.
HIERARCHICAL_GRAVITY
If this is enabled, the time integration is carried out in a hierarchically fashion in which the gravitational Hamiltonian is hierarchically split into slow and fast dynamics. The advantage of this approach is that small subsystems on short timesteps can be cleanly decoupled from the more slowly evolving rest of the system. This can be advantageous especially if there is a very deep and increasingly thinly populated tail in the timestep distribution. It also allows a time integration that is formally momentum conserving despite the use of individual timesteps. However, as additional partial forces need to be computed, this approach typically entails a somewhat higher number of force calculations. This can still be more efficient overall, but only if the number of particles per timebin quickly declines when going to shorter timebins.
FMM
If this is enabled, the Fast Multipole Method for gravity calculations instead of the one-sided classic tree algorithm is used.
MULTIPOLE_ORDER = 2
If this is enabled, one can control the order of the multipole expansion that is used. For a value of 3, quadrupole moments are included in the normal tree calculation and/or in the FMM calculations. A value of 4 includes octupole moments as well, and 5 goes further to hexadecupole moments. Note that these higher orders increase the memory and CPU time needed for the force calculations, but deliver more accurate forces in turn. It depends on the specific application whether this is worthwhile. The default is a value of 2.
EXTRA_HIGH_EWALD_ACCURACY
If this is extrapolated, the Ewald corrections are extrapolated from the look-up table with a third order Taylor expansion (quite a bit more expansive), otherwise a second order Taylor expansion is used.
EXTRAPOTTERM
If this is activated, the extra multipole term in principle present for the potential computation is evaluated, even though it does not enter the force. For example, for monopole/dipole order (p=2), the code will then compute quadrupole moments and use them in the potential, but not in the force computation.
ALLOW_DIRECT_SUMMATION
When this is set (it requires HIERARCHICAL_GRAVITY
), the code will
calculate the gravitational forces for very small groups of particles
(the threshold for the group size is given by the constant
DIRECT_SUMMATION_THRESHOLD
, which is set per default to 500, but one
can override this value in the config file if desired) with direct
summation. This can be useful if the timestep distribution has a tail
to very short, poorly occupied timebins. Doing the corresponding
timesteps frequently for very small sets of particles can be faster
with direct summation than doing it via the tree or FMM because then
all overhead associated with tree construction and tree walks can be
avoided.
RANDOMIZE_DOMAINCENTER
When this is activated the whole particle set is randomly shifted in the simulation box whenever a domain decomposition is done. This can be useful to average out in time subtle spatial correlations in the force errors that arise from the relative positioning of the particle to the oct-tree geometry. Since integer coordinates are used for particle positions, this shifting does not entail any round-off error, and is reversible. In particular, the shifts will not be visible in any of the outputs created. This option is basically always recommended, and should have only positive effects.
RANDOMIZE_DOMAINCENTER_TYPES = 2
Can be set to select one or several types (this is a bitmask) which will then be used to locate the extension of a certain region. When the particle set is randomly translated throughout the box, the code will then try to avoid intersecting large oct-tree node boundaries with this region. When this option is not set explicitely but PLACEHIGHRESREGION is active, then this is automatically done with a default setting RANDOMIZE_DOMAINCENTER_TYPES=PLACEHIGHRESREGION.
EVALPOTENTIAL
When this is activated, the code also computes the gravitational potential (which is not needed for the dynamics). This costs a bit of extra memory and CPU-time.
TREE_NUM_BEFORE_NODESPLIT = 4
The number of particles that may be in a tree node before it is split into daughter nodes. If the number is reduced to 1, a fully threaded tree is obtained in which leaf nodes contain one particle each. (Note that empty nodes are not stored, except potentially for part of the top-level tree if it is very finely refined.)
EXTERNALGRAVITY
If this is switched on, an external gravitational field can be added to the dynamics. One still has to define with different switches and/or parameters what this external field is.
EXTERNALGRAVITY_STATICHQ
Activates a simple external potential due to a Hernquist dark matter halo, with parameters specified in the parameterfile.
TreePM Options
PMGRID = 512
This enables the TreePM method, i.e. the long-range force is computed
with a PM-algorithm, and the short range force with the tree or with
FMM. The parameter has to be set to the size of the mesh that should
be used, e.g. 64, 96, 128, etc. The mesh dimensions need not
necessarily be a power of two, but the FFT is fastest for such a
choice. Note: If the simulation is not in a periodic box, then a FFT
method for vacuum boundaries is employed, where due to the required
zero-padding, only half the mesh is covering the region with
particles. (see also the HRPMGRID
option).
ASMTH = 1.25
This can be used to override the value assumed for the scale that defines the long-range/short-range force-split in the TreePM algorithm. The default value is 1.25, in units of mesh-cells. A larger value will make the transition region better resolved by the mesh, yielding higher accuracy and less residual scatter in the force matching region, but at the same time the region that needs to be covered by the tree/FMM grows, which makes the computation more expensive.
RCUT = 6.0
This can be used to override the maximum radius out to which the short-range tree-force is evaluated in case the TreePM/FMM-PM algorithm is used. The conservative default value is 7.0 for this parameter, given in mesh-cells. Going much beyond 6.0 does however not yield much further improvement in the way the force matching region is treated, and reducing this value to 4.5 will give higher performance while being typically sufficiently accurate for most applications.
NTAB = 128
This can be used to define the size of the short-range lookup table. The default should normally be sufficient to have negligible influence on the obtained force accuracy.
PLACEHIGHRESREGION = 2
If this option is set (will only work together with PMGRID
), then
the short range force computation in the TreePM/FMM-PM algorithm is
accelerated by an additional intermediate mesh with isolated boundary
conditions that is placed onto a certain region of high-res
particles. This procedure can be useful for zoom-simulations, where
the majority of particles (the "high-res" particles) are occupying
only a small fraction of the volume. To activate this, the option
needs to be set to an integer value in the form of a bit mask that
encodes the particle type(s) that should be used to define the spatial
location of the high-resolution patch. For example, if types 1 and 4
are supposed to define this region, then the parameter should be set
to PLACEHIGHRESREGION=2+16
, i.e. to the sum 2^1+2^4. The actual
spatial region is then determined automatically from the current
locations of these particle types. This high-res zone may also
intersect with the box boundaries if periodic boundaries are
used. Once the spatial region is defined by the code, it however
applies to all particle types. In fact, the short range interactions
between particle pairs that fall fully inside the high-res region are
computed in terms of two contributions, one from the intermediate PM
mesh that covers the high-res region, and the other from a tree/FMM
force that however now only extends to a region that is reduced in
size (which is the origin of a possible speed-up with this
option). Particle pairs for which at least one of the partners is
outside the high-res region get the normal Tree/FMM contribution with
the standard cut-off region, corresponding to the plain TreePM
approach with the course grid covering the full simulation
volume. Also note that when the particle set is randomized throughout
the box (with the RANDOMIZE_DOMAINCENTER
option), the code
additionally tries to avoid intersecting larger oct-tree node
boundaries by imposing certain restrictions in the randomization.
HRPMGRID = 512
When PMGRID
is set and non-periodic simulations are used, or when
PLACEHIGHRESREGION
is active, the FFT algorithm is used to compute
non-periodic gravitational fields, which requires zero-padding,
i.e. only one octant of the used grid can actually be covered by the
mass distribution. The grid size that the code uses for these FFTs is
equal to HRPMGRID
if this is set, otherwise the value of PMGRID
is
used for this grid dimension as well. This option is hence optional,
and allows if desired the use of different FFT sizes for the periodic
calculation covering the whole region, and for the non-periodic
calculations covering the zoom region.
FFT_COLUMN_BASED
Normally, the code employs a slab-based approach to parallelize the
FFT grids across the MPI ranks. However, for a Ngrid^3 FFT, this means
that at most Ngrid different MPI ranks can participate in the
computation. This represents a serious limit to scalability as the
minimum computational effort per MPI rank then scales as Ngrid^2, and
not more than Ngrid MPI ranks can anyhow be used. If one is in the
regime where the number of MPI ranks exceeds Ngrid, it is therefore a
good idea to activate FFT_COLUMN_BASED
, which will use a
slab-decomposition instead. Now the minimum effort per MPI rank scales
only as Ngrid, and the maximum number of ranks that can participate is
Ngrid^2, meaning that in practice this limit will not be
encountered. The results should not be affected by this
option. Because the column-based approach requires twice the number of
transpose operations, it is normally somewhat slower than the
slab-based approach in the regime where the latter can still scale
(i.e. for Ngrid >= Ncpu), so only for very large small processor
numbers and large grid sizes, the column based approach can be
expected to yield a speed advantage, aside from the better memory
balance it provides.
PM_ZOOM_OPTIMIZED
Set this option if the particle distribution is spatially extremely
inhomogeneous, like in a zoom simulation. In this case, the FFT
algorithm will use a different communication strategy that can better
deal with this inhomogeneity and maintain work balance despite of
it. If the particle distribution is reasonably homogenous (like in a
uniformly sampled cosmological box), it is normally better to leave
this option off. In this case, a simpler communication strategy well
adapted to this situation is used which avoids some of the overhead
required by the PM_ZOOM_OPTIMIZED
option.
TREEPM_NOTIMESPLIT
When activated, the long- and short-range gravity forces are simply summed up to a total force, and then integrated in time with a common timestep. Otherwise, the short-range forces are subcycled, and the PM force size is stored separately and integrated on a global, longer timestep.
GRAVITY_TALLBOX = 2
This can be used to set-up gravity with mixed-mode boundary conditions. The spatial dimension selected by the option is treated with non-periodic boundaries, while the other two are treated with periodic boundary conditions. This can facilitate, for example, stratified box simulations in which there is periodicity in the transverse directions (which define, e.g., the plane of a sheet) and open boundary conditions in the perpendicular direction. Set-ups of this kind are often used in simulations of star formation.
Treatment of gravitational softening
NSOFTCLASSES = 4
Number of different softening values. Traditionally, this is set equal
to the number of particle types, but it can also be chosen
differently. The mapping of a particle type to a particular softening
class is normally done through the parameter values
SofteningClassOfPartTypeX
as specified in the parameterfile.
Several different particle types could be mapped to the same softening
class in this case, and not all softening classes actually must be
used by particles. With the help of the INDIVIDUAL_GRAVITY_SOFTENING
option, the mapping can also be based on the particle mass, so that
particles of the same type may be mapped to different softening
lengths if their masses are different. This is an attractive option
especially for zoom simulations in order to allow heavier boundary
particles to be automatically associated with the closest natural
softening length from the tableau of available softening lengths.
INDIVIDUAL_GRAVITY_SOFTENING = 4+8+16
If this option is enabled, the selected particle types
(INDIVIDUAL_GRAVITY_SOFTENING
is interpreted as a binary mask
specifying these types) calculate for each of their particles a target
softening based on a cube-root scaling with the particle mass. As a
reference point, the code takes the softening class assigned for
particle type 1, and the average particle mass of the type 1
particles. The idea is to use this option for zoom simulations, where
one assigns the high-resolution collisionless particles to type 1 (all
with equal mass) and maps the type 1 to a certain softening class,
which then fixes the softening assigned for type 1. For all other used
particles types (which typically may involve variable masses within
the type), one then activates this option. For the corresponding
particles, a desired softening is computed based on a cube-root
scaling with their masses relative to the reference mass and softening
of the type 1 particles. From all the available softening classes, the
code then assigns the softening class with the smallest logarithmic
difference in softening length to the computed target softening
length. Since only the available softening classes can be used for
this, one should aim to supply a fine enough set of available
softening classes when this option is used.
ADAPTIVE_HYDRO_SOFTENING
Sometimes, one may want to scale the gravitational softening length of
gas particles with their SPH smoothing length. This can be achieved
with this option. Enabling it, requires additional parameters in the
parameterfile, namely GasSoftFactor
, MinimumComovingHydroSoftening
and AdaptiveHydroSofteningSpacing
. When this option is used, the
softening type 0 is not available for any other particle type (and
also won't be selected by types included in
INDIVIDUAL_GRAVITY_SOFTENING
). SofteningClassOfPartType0
needs to be
0, and all other particle types need to have a non-zero value for
their SofteningClassOfPartType. The softening values specified in the
parameterfile for softening type 0 are ignored, instead the softening
is selected for all gas particles based on their smoothing length from
a finely spaced discrete table (see explanations for the above
softening parameters).
SPH formulation
PRESSURE_ENTROPY_SPH
Enables the pressure-entropy formulation of SPH (similar to Hopkins 2013), otherwise the default density-entropy formulation (Springel & Hernquist, 2002) is used.
OUTPUT_PRESSURE_SPH_DENSITY
Outputs also density computed as in the standard SPH pressure-entropy
formulation. This is only useful if PRESSURE_ENTROPY_SPH
is used.
INITIAL_CONDITIONS_CONTAIN_ENTROPY
The intial conditions file contains entropy instead of the thermal energy.
GAMMA = 1.4
Sets the equation of state index in the ideal gas law that is normally used in GADGET-4's SPH implementation. If not set, the default of 5/3 for a mono-atomic gas is used.
ISOTHERM_EQS
This defines an isothermal equation of state, P = rho c^2, where c^2
is kept constant for every particle at the value u = c^2 read from the
initial conditions in the internal energy per unit mass field. GAMMA=1
is set automatically in this case.
REUSE_HYDRO_ACCELERATIONS_FROM_PREVIOUS_STEP
If this option is enabled, the code does not recompute the SPH hydrodynamical acceleration at the beginning of a timestep, but rather reuses the one computed at the end of the previous timestep, which is typically good enough. The two accelerations can in principle differ slightly due to non-reversible viscous effects, or external source functions (e.g. radiative heating or cooling).
IMPROVED_VELOCITY_GRADIENTS
Use more accurate estimates for the velocity gradients following Hu et. al (2014), which enter the calculation of a time-dependent artificial viscosity.
VISCOSITY_LIMITER_FOR_LARGE_TIMESTEPS
Limits the maximum hydrodynamical acceleration due to the artificial viscosity such that the viscosity term cannot change the sign of the relative velocity projected on the particle distance vector. This should not be necessary if small enough timestepping is chosen.
SPH kernel options
CUBIC_SPLINE_KERNEL
Enables the cubic spline kernel as defined in Springel et al. (2001).
WENDLAND_C2_KERNEL
Enables the Wendland C2 kernel as discussed in Dehnen & Aly (2012).
WENDLAND_C4_KERNEL
Enables the Wendland C4 kernel as discussed in Dehnen & Aly (2012).
WENDLAND_C6_KERNEL
Enables the Wendland C6 kernel as discussed in Dehnen & Aly (2012).
WENDLAND_BIAS_CORRECTION
Reduces the self contribution for Wendland kernels following Dehnen & Aly (2012), their equations (18) and (19). Only works in 3D.
SPH viscosity options
TIMEDEP_ART_VISC
Enables-time dependent viscosity.
HIGH_ART_VISC_START
Start with high rather than low viscosity.
NO_SHEAR_VISCOSITY_LIMITER
Turns of the shear viscosity suppression.
Extra physics
COOLING
Enables radiative cooling based on the collisional ionization equilibrium in the presence of a spatially constant but time-variable UV background. The network computed is similar to that described in the paper by Katz et al. (1996). Note that metal-line cooling is not included in this module.
STARFORMATION
If this is enabled, the code can create new star particles out of SPH
particles. The default star formation model that is implemented
corresponds to a basic varient of the sub-resolution multi-phase model
described in Springel & Hernquist (2003,
http://adsabs.harvard.edu/abs/2003MNRAS.339..289S). By default, the
new star particles are created as particle type 4, but if desired, one
can also specify another type for them by setting the compile time
flag STAR_TYPE
to a different value. Make sure that a gravitational
softening length is defined for the chosen type.
Time integration options
FORCE_EQUAL_TIMESTEPS
This adopts a global timestep for all particles, determined by pushing all particles down to the smallest timestep desired by any of the particles. The step size may still be variable in this case, but it is the same for all particles.
Single/double precision and data types
POSITIONS_IN_32BIT
When this is set, the internal storage of positions will be based on
32-bit unsigned integers. If single precision is used as default in
the code (i.e. DOUBLEPRECISION
is not set), then this is the
default if none of the POSITIONS_IN_XXBIT
options is selected.
POSITIONS_IN_64BIT
When this is set, the internal storage of positions will be based on
64-bit unsigned integers, otherwise on 32-bit unsigned integers. If
double precision is used as default in the code
(i.e. DOUBLEPRECISION
is set to 1), then this is the default if none
of the POSITIONS_IN_XXBIT
options is selected.
POSITIONS_IN_128BIT
With this option, the internal storage of positions will be based on 128-bit unsigned integers, offering an extreme spatial dynamic range. Use of this should be only of interest in truly extreme scenarios.
DOUBLEPRECISION = 1
This makes the code store all internal particle data in double
precision. Note that output files may nevertheless be written by
converting the values in files to single precision, unless
OUTPUT_IN_DOUBLEPRECISION
is activated.
DOUBLEPRECISION_FFTW
If this is set, the code will use the double-precision version of FTTW and store the corresponding field values in real and complex space in double. Otherwise single precision is used throughout for this.
OUTPUT_IN_DOUBLEPRECISION
The output snapshot files will be written in double precision when this is enabled. This is helpful to avoid round-off errors when using snapshot files for a restart, but will rarely be required for scientific analysis, except perhaps for spatial coordinates, where many zoom simulations are insufficiently represented by single precision. Outputting in mixed precision, double precision for coordinates and single precision for everything else, can therefore be a useful option for these simulations to save storage space without sacrificing anything in the analysis.
ENLARGE_DYNAMIC_RANGE_IN_TIME
This extends the dynamic range of the integer timeline from 32 to 64 bit, i.e. the smallest possible timestep is approximately 2^64 times smaller than the simulated timespan instead of 2^32 times. Correspondingly, the number of timebins grows from 32 to 64 in this case.
IDS_32BIT
If this is set, the code uses 32-bit particle IDs, hence at most 2^32 particles may be used. This is the default setting.
IDS_48BIT
If this is set, the code uses 48-bit particle IDs, allowing some smaller fields to be packed into this variable before a long-word boundary is reached. At most 2^48 particles can be used.
IDS_64BIT
If this is set, the code uses 64-bit particle IDs.
USE_SINGLEPRECISION_INTERNALLY
If this is activated, internal computations are carried out in single precision instead of double precision. On some architectures (but not on all -- often current CPUs use the same number of cycles for a single and a double precision operation), this can make some of the computations faster, but comes with a loss of precision, of course. Use with extreme care.
NUMBER_OF_MPI_LISTENERS_PER_NODE = 1
For multi-node runs, normally one MPI rank is set aside per
shared-memory node to asynchronously process incoming communication
requests. If the shared-memory node has a very large number of cores,
it might be helpful to have more than such communication processes,
which can be set with this parameter. If the number of cores per
shared memory node is larger than 64, then this in fact has to be
done, see also the MAX_NUMBER_OF_RANKS_WITH_SHARED_MEMORY
option.
MAX_NUMBER_OF_RANKS_WITH_SHARED_MEMORY = 64
This sets the maximum number of MPI ranks on a node that have one MPI
listener assigned to them, and can access each other via shared
memory. Possible values are 32 and 64, the default being 64. If the
total number of cores per node divided by
NUMBER_OF_MPI_LISTENERS_PER_NODE
is at most 32, a setting of 32 can
be used to save a small amount of memory. If the total number of cores
per node divided by NUMBER_OF_MPI_LISTENERS_PER_NODE
is above 64,
you need to increase NUMBER_OF_MPI_LISTENERS_PER_NODE
.
Output/Input options
POWERSPEC_ON_OUTPUT
Creates a power spectrum measurement for every output time, i.e. for every snapshot that is created. This is meant to be used in cosmological simulations.
REDUCE_FLUSH
The code produces relatively verbose log-file messages. To make sure
that they immediately appear in the log-file, a flush statement on the
output stream is executed when outputting a new log message. On slow
or overloaded filesystems, this can become a significant source of
overhead. To avoid this, this option can be used. A log-file flush is
then only done on intervals determined via the FlushCpuTimeDiff
parameter.
OUTPUT_POTENTIAL
This will force the code to compute gravitational potentials for all particles each time a snapshot file is generated. These values are then included in the snapshot files. Note that the computation of the values of the potential incurs some computational overhead.
OUTPUT_ACCELERATION
This will include the physical gravitational acceleration of each particle in the snapshot files.
OUTPUT_TIMESTEP
This outputs the timestep used by the particles. Useful only to test whether the timestep criteria behave in the intended way.
OUTPUT_PRESSURE
This outputs values for the pressure of each SPH particle (i.e. only for particles of type 0) to the snapshot files.
OUTPUT_VELOCITY_GRADIENT
This outputs the SPH estimates of the velocity gradients to the snapshot files, separately for the vx, vy, and vz components, i.e. one gets three 3-vectors.
OUTPUT_ENTROPY
When this option is activated, the code writes the values of the entropic variable associated with each SPH particle to the snapshot files.
OUTPUT_CHANGEOFENTROPY
This outputs the rate of change of entropy for each particle. Only the dissipative change due to shock heating is normally included here, meaning that radiative changes of the entropy are not included in this field, only the viscous heating from the artificial viscosity is.
OUTPUT_DIVVEL
With this option, one can request an output of the velocity divergence for each SPH particle in the snapshot files.
OUTPUT_CURLVEL
Likewise for the curl of the velocity field of all SPH particles, which is output in snapshots when this option is activated.
OUTPUT_COOLHEAT
With this option, the actual rate of energy loss due to radiative
cooling (or heating) can be output to the snapshot files. This option
requires COOLING
to be active as well.
OUTPUT_VISCOSITY_PARAMETER
With this option, one can request an output of the viscosity parameter
for each SPH particle in the snapshot files. This option requires
TIMEDEP_ART_VISC
to be active as well.
OUTPUT_NON_SYNCHRONIZED_ALLOWED
If this option is activated, outputs occur precisely at the prescribed desired output times, with particles being (linearly) drifted to this time, while velocities stay at the values they had after the last kick. Otherwise (which is the default), snapshots are only written at times when all particles are synchronized, and all timesteps have been completed with closing half-step velocity kicks. The desired output times are mapped to the closest full synchronization points, to the extent possible (note that if the spacing of desired output times is finer than the largest timestep size, some desired output times may have to be skipped).
OUTPUT_VELOCITIES_IN_HALF_PRECISION
Stores particle velocities in half-precision format.
OUTPUT_ACCELERATIONS_IN_HALF_PRECISION
Stores accelerations in half-precision. To prevent a potential overflow of the values, the actually stored values are normalized by the factor 10HV200, with V200 = 1000 km/sec.
OUTPUT_COORDINATES_AS_INTEGERS
Does not convert the internal integer coordinations to floating point before output, but rather outputs them in integer representation. This retains more bits of information, and may give rise to better compression possibilities if the particle data is spatially ordered.
ALLOW_HDF5_COMPRESSION
When this is enabled, certain output fields in the HDF5 output are compressed when written. The compression/decompression is done on the fly, and is transparent to the user (but implies slightly slower I/O speed).
On the fly FOF groupfinder
FOF
This switch enables the friends-of-friends group finder of the code. In this case, the code will always run the FOF group finder whenever a snapshot is produced. This is done directly before the snapshot is written to disk, allowing the particle data in the snapshot files to be arranged in group order, i.e. it is easy to read the particles of any desired group. Also, when FOF is set, the code can be applied in postprocessing mode to compute group catalogues for existing snapshots. The snapshot in question is then written a second time with a name suffix "reordered", with the particle data arrange in group order.
FOF_PRIMARY_LINK_TYPES = 2
This identifies via a bitmask the particle type or types to which the friends-of-friends linking algorithm should be applied. Conventionally these are (high-resolution) dark matter particles stored in type 1, so the default for this parameter is 2.
FOF_SECONDARY_LINK_TYPES = 1+16+32
All particles of types selected by this bitmask are looking for the
nearest particle from the set covered by FOF_PRIMARY_LINK_TYPES
and
are then made member of the FOF group this particle belongs to. The
idea is that the groups found with the plain FOF algorithm incorporate
all co-spatial particles from the set described by
FOF_SECONDARY_LINK_TYPES
. This is useful especially for
hydrodynamical cosmological simulations with radiative processes and
star formation. In this case, the spatial distribution of baryons
becomes very different from the dark matter, so that applying the
friends-of-friends method to both dark matter and baryonic particles
at the same time would yield highly distorted results. It is then much
cleaner to do this only for the non-dissipative dark matter, and let
these halos collect the baryons in the same halo with this option.
FOF_GROUP_MIN_LEN = 32
The minimum length a group needs to have before being stored in the group catalogue is set with this parameter. The default number is 32.
FOF_LINKLENGTH = 0.2
Dimensionless linking length for the friends of friends algorithm. The
code will cast this into a comoving linking length by estimating the
mean particle spacing from the dark matter density (as given by
OmegaDM
- OmegaBaryon
) and the mean particle mass of all the
particles selected with the FOF_PRIMARY_LINK_TYPES
mask. It makes
sense that the mass of all these particles is equal, only then the FOF
algorithm is known to produce well understood results. If in doubt,
check the log file for the linking length that is computed.
FOF_ALLOW_HUGE_GROUPLENGTH
Normally, the length of individual FOF groups and subhalos is restricted to reach at most 2^31 ~ 2 billion particles. If this is activated, it can be more.
Subfind
SUBFIND
When this is switched on, all identified FOF groups are subjected to a search for gravitationally bound substructures with the SUBFIND algorithm, as described in Springel et al. (2001), http://adsabs.harvard.edu/abs/2001MNRAS.328..726S. The snapshot outputs that are produced are automatically ordered in group plus subhalo order, i.e. all particles of the same group are stored subsequently, and subhalos are nested within each group, i.e. the particles are arranged in the order of the groups and subhalos.
SUBFIND_HBT
This enables an implementation of the hierarchical bound tracing
algorithm, where subhalo candidates are identified with the help of a
substructure catalogue from a precious time instead of doing this with
density excursion sets. This option requires both SUBFIND
and
MERGERTREE
.
SUBFIND_STORE_LOCAL_DENSITY
This will calculate local densities and velocity dispersions for all particles, not only for particles in FOFs, and store them in snapshot files.
SUBFIND_ORPHAN_TREATMENT
This produces special snapshot files after group catalogues are produced that contain only particles that have formerly been most bound particles of a subhalo. This can later be used by semi-analytic models of galaxy formation coupled to the merger tree output to treat temporarily lost or disrupted subhalos.
Merger tree algorithm
MERGERTREE
This options enables an on-the-fly calculation of descendant subhalos,
which are then stored along with the group catalogues. This requires
FOF
and SUBFIND
to be set, and is meant to be used in cosmological
simulations with a large number of outputs. The merger tree
information can then be used in a final postprocessing step to
construct merger trees from all the group catalogues and
descendant/progenitor links produced on the fly (or in
postprocessing). The merger tree construction done in this way can in
principle be carried out without having to store the actual particle
data of the snapshot files.
On-the-fly lightcone creation
LIGHTCONE
When this option is enabled, particle data is output continuously as
particles cross the backwards lightcone. The geometry of the lightcone
(or of several lightcones) is described by a separate input file as
specified in the parameterfile. If needed, also periodic replications
of the box are used to fill the lightcone volume. Note that either
LIGHTCONE_PARTICLES
or LIGHTCONE_MASSMAPS
, or both, need to be
selected as well if this option is activated.
LIGHTCONE_PARTICLES
When this option is enabled, particle data is output continuously as particles cross the backwards lightcone. The geometry of the lightcone (or of several lightcones) is described by a separate input file as specified in the parameterfile. If needed, also periodic replications of the box are used to fill the lightcone volume.
LIGHTCONE_OUTPUT_ACCELERATIONS
This outputs the gravitational accelerations of particles on the lightcone, which can be used for gravitational lensing applications.
LIGHTCONE_MASSMAPS
This option creates projected mass shells along the backwards
lightcone, for weak lensing applications. Requires the LIGHTCONE
option.
LIGHTCONE_PARTICLES_GROUPS
This option runs the FOF (and SUBFIND if enabled) group finders on the
lightcone particle data before they are written to disk. Requires the
LIGHTCONE
and LIGHTCONE_PARTICLES
options.
LIGHTCONE_IMAGE_COMP_HSML_VELDISP
This special option is only relevant for lightcone image creation, and (re)computes adaptive smoothing lengths as well as local velocity dispersions.
LIGHTCONE_MULTIPLE_ORIGINS
If this is enabled, origins of lightcones different from (0, 0, 0) can
be defined. Possible origins need to be listed in a separate file with
the name LightConeOriginsFile
. The light cone definitions file then needs
be augmented with a further number at the end of each lightcone
definition, and this serves as an index into the list of lightcone origins.
REARRANGE_OPTION
This option needs to be enabled to allow the rearrange lightcone particle feature to work. It only needs to be on for the special postprocessing option, otherwise it can be disabled to save memory.
IC generation
NGENIC = 256
This master switch enables the creation of cosmological initial
conditions for simulations with periodic boundary conditions. The
value of NGENIC
should be set to the FFT-grid size used for IC
generation, which should be at least as fine as the particle
resolution per dimension. If the code is started without restartflag
(i.e. when normally initial conditions are read), the code instead
creates the ICs first, followed by evolving them with the code,
i.e. in this case the initial conditions do not need to exist on
disk. One can however also start the code with restartflag 6, in which
case the ICs are produced and written to disk, followed by a stop of
the code. One can then also use the produced files as ICs in a regular
start of GADGET-4 without having the NGENIC
option set.
CREATE_GRID
If this is activated, the IC creation is carried out with a regular Cartesian particle grid that is produced on the fly. Otherwise, the unperturbed particle load is read in from the specified IC file, which allows the use of a so-called glass files (which arise from evolving random Poisson samples with the sign of gravity reversed until they settle in a quasi-equilibrium without preferred directions) or of spatially variable resolution.
GENERATE_GAS_IN_ICS
This option can be used to modify dark matter-only initial conditions for cosmological simulations upon start-up of the simulation code. The modification is to add gas to the simulation by splitting up dark matter particles into a dark matter and gas particle, with masses set by the specified cosmological parameters. The particle pair is displaced in opposite directions from the original coordinate keeping the center-of-mass fixed. The separation is such that the new dark matter and gas particles form two interleaved grids that maximize the relative distance between the two particle types and minimizes pairing correlations. The velocities of the particles inherit the velocity of the original dark matter particle. Note that here the transfer functions of dark matter and gas are not distinguished, so this procedure is only a rough approximation. It is however typically sufficient on large scales and for galaxy formation.
SPLIT_PARTICLE_TYPE = 4+8
This bitmask determines which of the dark matter particles contained
in the dark matter only initial conditions that are processed with
GENERATE_GAS_IN_ICS
should be split up into gas and dark matter
particles. Normally, this should be done for all of the dark matter
particles to produce a volume filling gas phase. However, sometimes
this is restricted to the high-resolution region in zoom simulations,
which can then be picked out with this option.
NGENIC_FIX_MODE_AMPLITUDES
When activated, this leaves the mode amplitudes at sqrt[P(k)], instead of sampling them from a Rayleigh distribution. This can be useful in the context of the variance suppression technique of Angulo & Pontzen (2016).
NGENIC_MIRROR_PHASES
If this is activated, all phases in the created realization are turned by 180 degrees. This is useful to realize a pair of simulations that differ only by the sign of the initial density perturbations but which are otherwise identical.
NGENIC_2LPT
This option creates the initial conditions based on second-order Lagrangian perturbation theory, instead of just using the Zeldovich approximation. Especially when the starting redshift is low, this option is recommended.
NGENIC_TEST
This option is purely for testing purposes. When the code creates ICs on the fly, it just measures the power spectrum of the produced ICs and terminates.
MPI related settings
MPI_MESSAGE_SIZELIMIT_IN_MB = 200
Some (especially older) MPI libraries are not overly stable when very
large transfers are done. Such transfers can however happen in
GADGET-4 for large simulations, for example in the domain
decomposition. With this option one can ask the code to automatically
split up such large transfers in a sequence of smaller transfers. The
maximum allowed size of one of the transfers in MB is set by the value
given to MPI_MESSAGE_SIZELIMIT_IN_MB
.
NUMPART_PER_TASK_LARGE
Set this if the number of particles per task is quite large, in particular so large than 8 times this number can overflow a 32-bit integer. This means that once you expect ~500 million or more particles on a single MPI rank, this option needs to be set to guarantee that the PM algorithms still work correctly. Of course, once you reach more than 2 billion particles per MPI rank, the code will stop working anyhow due to integer overflows. The easy solution to this, of course, is to increase the number of MPI ranks.
ISEND_IRECV_IN_DOMAIN
This option can be used to replace the default communication pattern used in the domain decomposition (and also in FOF and SUBFIND) which is based on a hypercube with synchronous myMPI_Sendrecv() calls, with a bunch of asynchronous communications. This should be faster in principle, but it also tends to result in a huge number of simultaneously open communication requests which can also choke the MPI communication subsystem. Whether this works robustly and is indeed faster will depend on the system and the simulation size. If in doubt, rather stick with the default algorithm.
USE_MPIALLTOALLV_IN_DOMAINDECOMP
Another approach to carry out the all-to-all communication occurring in the domain decomposition is to simply use MPI's Alltoallv function. This is done when this option is set, and one then effectively hopes that the internal algorithm used by Alltoallv is the most robust and fastest for the communication task at hand. This may be the case, but there is no guarantee for it. The default algorithm of GADGET-4 (hypercube with synchronous myMPI_Sendrecv), which is used when this option is not used, should always be a reliable alternative, however.
MPI_HYPERCUBE_ALLGATHERV
Another issue with some MPI-libraries is that they may use quite a bit of internal storage for carrying out MPI_Allgatherv. If this turns out to be a problem, one can set this option. The code will then replace all uses of MPI_Allgatherv() with a simpler communication pattern that uses hypercubes with myMPI_Sendrecv as a work-around.
MPI_HYPERCUBE_ALLTOALL
Some MPI libraries tend to be unstable for their myMPI_Alltoall. This is replacing this with a robust hypercube communication pattern. Not necessarily the fastest, but very robust, scalable and with decent speed.
ALLOCATE_SHARED_MEMORY_VIA_POSIX
If this is set, try to use POSIX directly to allocated shared memory in the virtual filesystem /dev/shm, instead of relying on the MPI-3 call MPI_Win_allocate_shared() which on some systems executes in a sluggish way.
Testing and Debugging options
DEBUG
This option is only meant to enable core-dumps (which are typically disabled by calling MPI_Init() on program start-up). This can be useful to allow post-mortem analysis of a crash by loading the core file with a debugger. Of course, the code should be compiled with symbols included (-g option) to facilitate this, and it may also help to set the optimization level to something low or disable optimizations entirely to avoid confusing the debugger in some situations.
DEBUG_ENABLE_FPU_EXCEPTIONS
This option is useful in combination with DEBUG and tries to enable FPU exceptions. In this case, an illegal mathematical floating point instruction that creates a dreaded "Not a Number" (NaN) will trigger a core file. With the debugger one can then quickly find the line in the code that is the culprit.
DEBUG_SYMTENSORS
This option executes a few selected unit tests on the symmetric-tensor subroutines on start-up of the code.
HOST_MEMORY_REPORTING
This option reports, when the code starts, available system memory information by analyzing /proc/meminfo on Linux systems. It is enabled by default on Linux. Output of this option is found at the beginning of the stdout log-file, and for example looks like this:
------------------------------------------------------------------------------------------------------------------------- AvailMem: Largest = 29230.83 Mb (on task= 24), Smallest = 29201.10 Mb (on task= 12), Average = 29215.28 Mb Total Mem: Largest = 32213.01 Mb (on task= 0), Smallest = 32213.01 Mb (on task= 0), Average = 32213.01 Mb Committed_AS: Largest = 3011.91 Mb (on task= 12), Smallest = 2982.18 Mb (on task= 24), Average = 2997.73 Mb SwapTotal: Largest = 23436.99 Mb (on task= 0), Smallest = 23436.99 Mb (on task= 0), Average = 23436.99 Mb SwapFree: Largest = 22588.00 Mb (on task= 0), Smallest = 21600.56 Mb (on task= 12), Average = 22020.50 Mb AllocMem: Largest = 3011.91 Mb (on task= 12), Smallest = 2982.18 Mb (on task= 24), Average = 2997.73 Mb ------------------------------------------------------------------------------------------------------------------------- Task=0 has the maximum committed memory and is host: sandy-022 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ENABLE_HEALTHTEST
When this is enabled, the code tries to assess upon start-up whether all CPU cores are freely available and show the same execution speed. Also, the MPI bandwidth both inside nodes and between nodes is tested.
FORCETEST = 0.001
Calculates for the specified fraction of particles direct summation
forces, which can then be compared to the forces computed by the
Tree/PM/FMM algorithms of GADGET-4 in order to check or monitor the
force accuracy of the code. This is only included as a testing and
debugging option. The value of the option should be set to a number
between 0 and 1 (e.g. 0.001), and this number gives the fraction of
randomly chosen particles at each timestep for which forces by direct
summation are computed. The normal tree-forces and the exact direct
summation forces are then collected in a file forcetest.txt
for
later inspection. Note that the simulation itself is unaffected by
this option, but it will of course run much(!) slower, particularly if
FORCETEST * NumPart * NumPart >> NumPart
Note: The particle IDs must
be set to numbers >= 1 for this option to work.
FORCETEST_TESTFORCELAW = 1
Special option for measuring the effective force law. Can be set to 1
or 2 for checking with TreePM/FMM-PM, or TreePM/FMM-PM +
PLACEHIGHRESREGION. The option FORCETEST
must be activated as
well. The simulation needs to be fed with a special initial
conditions file for which only one particle has mass, the others are
massless test particles. The code will then go through cycles in which
the particle with the mass is randomly placed, and the other particles
are randomly placed around it, with distance spacing uniform in
log(r). After 40 cycles are carried out, the code terminates, and the
force-law accuracy can be examined by analysing the file
forcetest.txt
.
FORCETEST_FIXEDPARTICLESET
This always checks the same particle IDs if force accuracy is checked during a run.
VTUNE_INSTRUMENT
This option creates additional instrumentation instructions for the Intel VTune code performance tool, based on the internal timing routines. This can be used for a performance analysis based on this tool.
DEBUG_MD5
This option can be used to compute MD5 checksums of the P[] and SphP[] arrays regularly in the code, with the results being written to the log-file memory.txt. Using this, one can check for binary invariance of the code when the code is interrupted and resumed from restart files.
TILING = 2
Replicates the read-in ICs the specified number of times in each dimension. This can be used for scaling tests.
SQUASH_TEST
Squeezes the ICs on read-in on order to create a distortion from spherical symmetry in certain force calculation tests.
DOMAIN_SPECIAL_CHECK
Outputs test data to check the balancing algorithms.
EWALD_TEST
A development test for testing the accuracy of the Ewald table lookup.
RECREATE_UNIQUE_IDS
This option can be used to reinitialize the particle IDs upon start-up. Useful if one has to deal with a broken IC file.
NO_STOP_BELOW_MINTIMESTEP
Do not stop when the code wants to adopt a timestep below the specified minimum timestep, but rather enforce this step size.
DO_NOT_PRODUCE_BIG_OUTPUT
This special option allows one to refrain from writing large output files (restart files, snapshots, and group catalogues), which can be useful for scaling tests.
STOP_AFTER_STEP = 8
After the corresponding step has been completed, the simulation ends. This is meant to simplify certain performance and scalability tests.
MEASURE_TOTAL_MOMENTUM
This option computes the total conjugate momentum after every step. Can be used to check for manifest momentum conservation of different force computation schemes.
TREE_NO_SAFETY_BOX
When enabled, this disables the geometric 'near node' protection, i.e. for the one-sided tree, one may then be closer to a node's center than 1.5 times the node size, and for FMM, adjacent nodes may interact.