Code Configuration

Many aspects of GADGET-4 are controlled with compile-time options rather than run-time options in the parameterfile. This is done in order to allow the generation of highly optimised binaries by the compiler, even when the underlying source code allows for many different ways to run the code. Unfortunately, this technique has the disadvantage that different simulations typically require different binary executables of GADGET-4, so "installing" GADGET-4 on a computer system is not possible, and the act of compiling the code is an integral part of working with the code. Keeping the executable at a single place is not recommended either, because if several simulations are run concurrently, this invokes the danger that a simulation is started or resumed with the wrong binary. Note that while GADGET-4 checks the plausibility of some of the most important code options, this is not done for all of them. Hence, to minimise the risk of using the wrong code for a simulation, it is highly recommended to produce a separate executable for each simulation that is run.

As a piece of advice, a good strategy for doing this in practice is to create a separate directory for each simulation that is made, place a copy of the whole simulation source code together with its makefile into this directory, compile the code there and run it in this directory as well, with the output directory specified as a subdirectory of the simulation directory. In this way, the code and its settings become a logical and integral part of the output generated by the code. Everything belonging to a given simulation is packaged up in one directory, and it becomes easy to reproduce what was done even if considerable time should have passed, because the precise version of the original code that was used and all produced log files are readily available. An alternative is to have only a single copy of the code, but then to use a separate directory for each simulation that is done, including placing its configuration file there and carrying out the compilation in this directory as well (by passing the make DIR= option to the build as well). Then at least all object files and the executable are unambiguously associated with a particular simulation run.

The easiest way to create the file Config.sh needed for compilation is to produce it as a copy of Template-Config.sh , and then modify it as needed. When created from Template-Config.sh , the Config.sh file contains a dummy list of all available compile-time code options, with most of them commented out by default. To activate a certain feature, the corresponding symbol should be commented in, and given the desired value, if appropriate.

Important Note:

Whenever you change one of the options described below, a full recompilation of the code may in general be necessary. For this reason, the Config.sh itself has been added to the list of dependences of each source file in the makefile, such that a complete recompilation should happen automatically when the configuration file is changed and the command make is given. Note that a manual recompilation of the whole code can be enforced with the command make clean , which will erase all object files, followed by make .

Most code options in Config.sh are switches that toggle a certain feature on/off. Some of the symbols also take on a value if set. The following list shows these switches with fiducial example values where appropriate.

Parallelization options

IMPOSE_PINNING

Ask the code to pin MPI processes to cores in an optimum fashion. This requires the hwloc library for detecting the processor topology. It will generally only work on Linux. Note that most modern MPI libraries can also be asked to arrange for the pinning via options to the MPI start-up command, or they do this per default anyhow.

IMPOSE_PINNING_OVERRIDE_MODE

In case the MPI start-up has already established a pinning, this is normally detected and then IMPOSE_PINNING does not do anything. Overriding this pre-established pinning can be enforced with this option.

EXPLICIT_VECTORIZATION

This enables a few compute kernel (currently in SPH only) to explicitly use AVX instructions through the use of the vectorclass C++ library.

PRESERVE_SHMEM_BINARY_INVARIANCE

This can be used to preserve the order in which partial results are added in case the parallel tree walks use shared memory to access tree branch data that has been imported by different processes on the same shared memory node. In this case, exact binary invariance of output can be retained upon reruns of the code.

SIMPLE_DOMAIN_AGGREGATION

This tries to very roughly restrict the domain decomposition to place adjacent domain pieces on the same shared memory node. This will then (in some cases significantly) reduce the number of imports of particles and nodes that need to be made, but at the price of a higher imbalance overall. Whether this is worth it depends strongly on the problem type. A better (forthcoming) solution will be to do the domain decomposition hierarchically right away, taking the outline of the shared-memory nodes into account from the outset.

Basic operation mode of code

PERIODIC

Set this option if you want to have periodic boundary conditions. In this case, the BoxSize parameter in the parameterfile becomes relevant.

TWODIMS

This effectively switches off one spatial dimension in SPH, i.e. the code follows only 2D hydrodynamics either in the xy-, yz-, or xz-plane. One doesn't need to tell the code explicitly which of these planes are used, but all coordinates of the third dimension need to be exactly equal (usually set to zero).

ONEDIMS

Similarly to TWODIMS, this effectively only allows one spatial dimension in SPH, i.e. the code follows only 1D hydrodynamics either in the x-, y-, or z-directions. One doesn't need to tell the code explicitly which of these directions are used, but all coordinates of the other dimensions should be set to zero.

LONG_X_BITS = 2

In case periodic boundary conditions are used (i.e. PERIODIC is on), one can stretch the x-dimension of the box relative to BoxSize by the factor 1 / 2^LONG_X_BITS. A setting equal to 2 like in this example, would hence mean that the boxsize in the x-direction will be BoxSize/4.

LONG_Y_BITS = 2

Similarly to the above, this switch can stretch the periodic box in the y-direction by the factor 1/2^LONG_Y_BITS relative to BoxSize.

LONG_Z_BITS = 1

Finally, this option implements a possible stretch in the z-direction, similar to LONG_X and LONG_Y. Only a subset of the stretch factors or several/all of them may be used. The LONG_X/Y/Z_BITS values must be positive integers.

NTYPES = 6

Number of particle types that should be used by the code. If not set, a default value of 6 is adopted (which is the fixed value realised in GADGET-2/3). The different particle types are useful as a means to organize the simulation particles into different logical sets, which can be helpful for analysis (e.g., one may have a type for dark matter, one for stars, one for disk stars, etc.). The type 0 is reserved for gas particles, and to them SPH is applied. All other particle types are treated as collisionless particles which are treated on the same footing as far as gravity is concerned. For each of the types, one needs to specify a gravitational softening class in the parameterfile.

GADGET2_HEADER

This should be set if backwards compatibility of the snapshot file header format with GADGET-2/3 is required. Applies only to file formats 1 and 2. This can be useful, for example, to read in old initial conditions. Note that in this case NTYPES may not be larger than 6, and it should be at least as large as the number of types actually contained in the legacy initial conditions. Also, files cannot have more than 2^31 particles of any type in total in this case.

SECOND_ORDER_LPT_ICS

Processes the initial conditions before actual simulation start-up to add in second order Lagrangian perturbation theory corrections. This only works for specially constructed initial conditions created with Adrian Jenkin's IC code.

LEAN

This option is meant for special DM-only simulations that aim at high savings in memory use. Only a uniform particle mass, a single particle type, no hydrodynamics, and a single softening class are allowed. In addition, one should use TREEPM_NOTIMESPLIT, and refrain from using double precision to obtain a very small memory footprint.

Gravity calculation

SELFGRAVITY

This needs to be switched on if self-gravity should be computed by the code. If it is off, one can still have hydrodynamics, and optionally a prescribed external gravitational field.

HIERARCHICAL_GRAVITY

If this is enabled, the time integration is carried out in a hierarchically fashion in which the gravitational Hamiltonian is hierarchically split into slow and fast dynamics. The advantage of this approach is that small subsystems on short timesteps can be cleanly decoupled from the more slowly evolving rest of the system. This can be advantageous especially if there is a very deep and increasingly thinly populated tail in the timestep distribution. It also allows a time integration that is formally momentum conserving despite the use of individual timesteps. However, as additional partial forces need to be computed, this approach typically entails a somewhat higher number of force calculations. This can still be more efficient overall, but only if the number of particles per timebin quickly declines when going to shorter timebins.

FMM

If this is enabled, the Fast Multipole Method for gravity calculations instead of the one-sided classic tree algorithm is used.

MULTIPOLE_ORDER = 2

If this is enabled, one can control the order of the multipole expansion that is used. For a value of 3, quadrupole moments are included in the normal tree calculation and/or in the FMM calculations. A value of 4 includes octupole moments as well, and 5 goes further to hexadecupole moments. Note that these higher orders increase the memory and CPU time needed for the force calculations, but deliver more accurate forces in turn. It depends on the specific application whether this is worthwhile. The default is a value of 2.

EXTRA_HIGH_EWALD_ACCURACY

If this is extrapolated, the Ewald corrections are extrapolated from the look-up table with a third order Taylor expansion (quite a bit more expansive), otherwise a second order Taylor expansion is used.

EXTRAPOTTERM

If this is activated, the extra multipole term in principle present for the potential computation is evaluated, even though it does not enter the force. For example, for monopole/dipole order (p=2), the code will then compute quadrupole moments and use them in the potential, but not in the force computation.

ALLOW_DIRECT_SUMMATION

When this is set (it requires HIERARCHICAL_GRAVITY), the code will calculate the gravitational forces for very small groups of particles (the threshold for the group size is given by the constant DIRECT_SUMMATION_THRESHOLD, which is set per default to 500, but one can override this value in the config file if desired) with direct summation. This can be useful if the timestep distribution has a tail to very short, poorly occupied timebins. Doing the corresponding timesteps frequently for very small sets of particles can be faster with direct summation than doing it via the tree or FMM because then all overhead associated with tree construction and tree walks can be avoided.

RANDOMIZE_DOMAINCENTER

When this is activated the whole particle set is randomly shifted in the simulation box whenever a domain decomposition is done. This can be useful to average out in time subtle spatial correlations in the force errors that arise from the relative positioning of the particle to the oct-tree geometry. Since integer coordinates are used for particle positions, this shifting does not entail any round-off error, and is reversible. In particular, the shifts will not be visible in any of the outputs created. This option is basically always recommended, and should have only positive effects.

RANDOMIZE_DOMAINCENTER_TYPES = 2

Can be set to select one or several types (this is a bitmask) which will then be used to locate the extension of a certain region. When the particle set is randomly translated throughout the box, the code will then try to avoid intersecting large oct-tree node boundaries with this region. When this option is not set explicitely but PLACEHIGHRESREGION is active, then this is automatically done with a default setting RANDOMIZE_DOMAINCENTER_TYPES=PLACEHIGHRESREGION.

EVALPOTENTIAL

When this is activated, the code also computes the gravitational potential (which is not needed for the dynamics). This costs a bit of extra memory and CPU-time.

TREE_NUM_BEFORE_NODESPLIT = 4

The number of particles that may be in a tree node before it is split into daughter nodes. If the number is reduced to 1, a fully threaded tree is obtained in which leaf nodes contain one particle each. (Note that empty nodes are not stored, except potentially for part of the top-level tree if it is very finely refined.)

EXTERNALGRAVITY

If this is switched on, an external gravitational field can be added to the dynamics. One still has to define with different switches and/or parameters what this external field is.

EXTERNALGRAVITY_STATICHQ

Activates a simple external potential due to a Hernquist dark matter halo, with parameters specified in the parameterfile.

TreePM Options

PMGRID = 512

This enables the TreePM method, i.e. the long-range force is computed with a PM-algorithm, and the short range force with the tree or with FMM. The parameter has to be set to the size of the mesh that should be used, e.g. 64, 96, 128, etc. The mesh dimensions need not necessarily be a power of two, but the FFT is fastest for such a choice. Note: If the simulation is not in a periodic box, then a FFT method for vacuum boundaries is employed, where due to the required zero-padding, only half the mesh is covering the region with particles. (see also the HRPMGRID option).

ASMTH = 1.25

This can be used to override the value assumed for the scale that defines the long-range/short-range force-split in the TreePM algorithm. The default value is 1.25, in units of mesh-cells. A larger value will make the transition region better resolved by the mesh, yielding higher accuracy and less residual scatter in the force matching region, but at the same time the region that needs to be covered by the tree/FMM grows, which makes the computation more expensive.

RCUT = 6.0

This can be used to override the maximum radius out to which the short-range tree-force is evaluated in case the TreePM/FMM-PM algorithm is used. The conservative default value is 7.0 for this parameter, given in mesh-cells. Going much beyond 6.0 does however not yield much further improvement in the way the force matching region is treated, and reducing this value to 4.5 will give higher performance while being typically sufficiently accurate for most applications.

NTAB = 128

This can be used to define the size of the short-range lookup table. The default should normally be sufficient to have negligible influence on the obtained force accuracy.

PLACEHIGHRESREGION = 2

If this option is set (will only work together with PMGRID ), then the short range force computation in the TreePM/FMM-PM algorithm is accelerated by an additional intermediate mesh with isolated boundary conditions that is placed onto a certain region of high-res particles. This procedure can be useful for zoom-simulations, where the majority of particles (the "high-res" particles) are occupying only a small fraction of the volume. To activate this, the option needs to be set to an integer value in the form of a bit mask that encodes the particle type(s) that should be used to define the spatial location of the high-resolution patch. For example, if types 1 and 4 are supposed to define this region, then the parameter should be set to PLACEHIGHRESREGION=2+16 , i.e. to the sum 2^1+2^4. The actual spatial region is then determined automatically from the current locations of these particle types. This high-res zone may also intersect with the box boundaries if periodic boundaries are used. Once the spatial region is defined by the code, it however applies to all particle types. In fact, the short range interactions between particle pairs that fall fully inside the high-res region are computed in terms of two contributions, one from the intermediate PM mesh that covers the high-res region, and the other from a tree/FMM force that however now only extends to a region that is reduced in size (which is the origin of a possible speed-up with this option). Particle pairs for which at least one of the partners is outside the high-res region get the normal Tree/FMM contribution with the standard cut-off region, corresponding to the plain TreePM approach with the course grid covering the full simulation volume. Also note that when the particle set is randomized throughout the box (with the RANDOMIZE_DOMAINCENTER option), the code additionally tries to avoid intersecting larger oct-tree node boundaries by imposing certain restrictions in the randomization.

HRPMGRID = 512

When PMGRID is set and non-periodic simulations are used, or when PLACEHIGHRESREGION is active, the FFT algorithm is used to compute non-periodic gravitational fields, which requires zero-padding, i.e. only one octant of the used grid can actually be covered by the mass distribution. The grid size that the code uses for these FFTs is equal to HRPMGRID if this is set, otherwise the value of PMGRID is used for this grid dimension as well. This option is hence optional, and allows if desired the use of different FFT sizes for the periodic calculation covering the whole region, and for the non-periodic calculations covering the zoom region.

FFT_COLUMN_BASED

Normally, the code employs a slab-based approach to parallelize the FFT grids across the MPI ranks. However, for a Ngrid^3 FFT, this means that at most Ngrid different MPI ranks can participate in the computation. This represents a serious limit to scalability as the minimum computational effort per MPI rank then scales as Ngrid^2, and not more than Ngrid MPI ranks can anyhow be used. If one is in the regime where the number of MPI ranks exceeds Ngrid, it is therefore a good idea to activate FFT_COLUMN_BASED, which will use a slab-decomposition instead. Now the minimum effort per MPI rank scales only as Ngrid, and the maximum number of ranks that can participate is Ngrid^2, meaning that in practice this limit will not be encountered. The results should not be affected by this option. Because the column-based approach requires twice the number of transpose operations, it is normally somewhat slower than the slab-based approach in the regime where the latter can still scale (i.e. for Ngrid >= Ncpu), so only for very large small processor numbers and large grid sizes, the column based approach can be expected to yield a speed advantage, aside from the better memory balance it provides.

PM_ZOOM_OPTIMIZED

Set this option if the particle distribution is spatially extremely inhomogeneous, like in a zoom simulation. In this case, the FFT algorithm will use a different communication strategy that can better deal with this inhomogeneity and maintain work balance despite of it. If the particle distribution is reasonably homogenous (like in a uniformly sampled cosmological box), it is normally better to leave this option off. In this case, a simpler communication strategy well adapted to this situation is used which avoids some of the overhead required by the PM_ZOOM_OPTIMIZED option.

TREEPM_NOTIMESPLIT

When activated, the long- and short-range gravity forces are simply summed up to a total force, and then integrated in time with a common timestep. Otherwise, the short-range forces are subcycled, and the PM force size is stored separately and integrated on a global, longer timestep.

GRAVITY_TALLBOX = 2

This can be used to set-up gravity with mixed-mode boundary conditions. The spatial dimension selected by the option is treated with non-periodic boundaries, while the other two are treated with periodic boundary conditions. This can facilitate, for example, stratified box simulations in which there is periodicity in the transverse directions (which define, e.g., the plane of a sheet) and open boundary conditions in the perpendicular direction. Set-ups of this kind are often used in simulations of star formation.

Treatment of gravitational softening

NSOFTCLASSES = 4

Number of different softening values. Traditionally, this is set equal to the number of particle types, but it can also be chosen differently. The mapping of a particle type to a particular softening class is normally done through the parameter values SofteningClassOfPartTypeX as specified in the parameterfile. Several different particle types could be mapped to the same softening class in this case, and not all softening classes actually must be used by particles. With the help of the INDIVIDUAL_GRAVITY_SOFTENING option, the mapping can also be based on the particle mass, so that particles of the same type may be mapped to different softening lengths if their masses are different. This is an attractive option especially for zoom simulations in order to allow heavier boundary particles to be automatically associated with the closest natural softening length from the tableau of available softening lengths.

INDIVIDUAL_GRAVITY_SOFTENING = 4+8+16

If this option is enabled, the selected particle types (INDIVIDUAL_GRAVITY_SOFTENING is interpreted as a binary mask specifying these types) calculate for each of their particles a target softening based on a cube-root scaling with the particle mass. As a reference point, the code takes the softening class assigned for particle type 1, and the average particle mass of the type 1 particles. The idea is to use this option for zoom simulations, where one assigns the high-resolution collisionless particles to type 1 (all with equal mass) and maps the type 1 to a certain softening class, which then fixes the softening assigned for type 1. For all other used particles types (which typically may involve variable masses within the type), one then activates this option. For the corresponding particles, a desired softening is computed based on a cube-root scaling with their masses relative to the reference mass and softening of the type 1 particles. From all the available softening classes, the code then assigns the softening class with the smallest logarithmic difference in softening length to the computed target softening length. Since only the available softening classes can be used for this, one should aim to supply a fine enough set of available softening classes when this option is used.

ADAPTIVE_HYDRO_SOFTENING

Sometimes, one may want to scale the gravitational softening length of gas particles with their SPH smoothing length. This can be achieved with this option. Enabling it, requires additional parameters in the parameterfile, namely GasSoftFactor, MinimumComovingHydroSoftening and AdaptiveHydroSofteningSpacing. When this option is used, the softening type 0 is not available for any other particle type (and also won't be selected by types included in INDIVIDUAL_GRAVITY_SOFTENING). SofteningClassOfPartType0 needs to be 0, and all other particle types need to have a non-zero value for their SofteningClassOfPartType. The softening values specified in the parameterfile for softening type 0 are ignored, instead the softening is selected for all gas particles based on their smoothing length from a finely spaced discrete table (see explanations for the above softening parameters).

SPH formulation

PRESSURE_ENTROPY_SPH

Enables the pressure-entropy formulation of SPH (similar to Hopkins 2013), otherwise the default density-entropy formulation (Springel & Hernquist, 2002) is used.

OUTPUT_PRESSURE_SPH_DENSITY

Outputs also density computed as in the standard SPH pressure-entropy formulation. This is only useful if PRESSURE_ENTROPY_SPH is used.

INITIAL_CONDITIONS_CONTAIN_ENTROPY

The intial conditions file contains entropy instead of the thermal energy.

GAMMA = 1.4

Sets the equation of state index in the ideal gas law that is normally used in GADGET-4's SPH implementation. If not set, the default of 5/3 for a mono-atomic gas is used.

ISOTHERM_EQS

This defines an isothermal equation of state, P = rho c^2, where c^2 is kept constant for every particle at the value u = c^2 read from the initial conditions in the internal energy per unit mass field. GAMMA=1 is set automatically in this case.

REUSE_HYDRO_ACCELERATIONS_FROM_PREVIOUS_STEP

If this option is enabled, the code does not recompute the SPH hydrodynamical acceleration at the beginning of a timestep, but rather reuses the one computed at the end of the previous timestep, which is typically good enough. The two accelerations can in principle differ slightly due to non-reversible viscous effects, or external source functions (e.g. radiative heating or cooling).

IMPROVED_VELOCITY_GRADIENTS

Use more accurate estimates for the velocity gradients following Hu et. al (2014), which enter the calculation of a time-dependent artificial viscosity.

VISCOSITY_LIMITER_FOR_LARGE_TIMESTEPS

Limits the maximum hydrodynamical acceleration due to the artificial viscosity such that the viscosity term cannot change the sign of the relative velocity projected on the particle distance vector. This should not be necessary if small enough timestepping is chosen.

SPH kernel options

CUBIC_SPLINE_KERNEL

Enables the cubic spline kernel as defined in Springel et al. (2001).

WENDLAND_C2_KERNEL

Enables the Wendland C2 kernel as discussed in Dehnen & Aly (2012).

WENDLAND_C4_KERNEL

Enables the Wendland C4 kernel as discussed in Dehnen & Aly (2012).

WENDLAND_C6_KERNEL

Enables the Wendland C6 kernel as discussed in Dehnen & Aly (2012).

WENDLAND_BIAS_CORRECTION

Reduces the self contribution for Wendland kernels following Dehnen & Aly (2012), their equations (18) and (19). Only works in 3D.

SPH viscosity options

TIMEDEP_ART_VISC

Enables-time dependent viscosity.

HIGH_ART_VISC_START

Start with high rather than low viscosity.

NO_SHEAR_VISCOSITY_LIMITER

Turns of the shear viscosity suppression.

Extra physics

COOLING

Enables radiative cooling based on the collisional ionization equilibrium in the presence of a spatially constant but time-variable UV background. The network computed is similar to that described in the paper by Katz et al. (1996). Note that metal-line cooling is not included in this module.

STARFORMATION

If this is enabled, the code can create new star particles out of SPH particles. The default star formation model that is implemented corresponds to a basic varient of the sub-resolution multi-phase model described in Springel & Hernquist (2003, http://adsabs.harvard.edu/abs/2003MNRAS.339..289S). By default, the new star particles are created as particle type 4, but if desired, one can also specify another type for them by setting the compile time flag STAR_TYPE to a different value. Make sure that a gravitational softening length is defined for the chosen type.

Time integration options

FORCE_EQUAL_TIMESTEPS

This adopts a global timestep for all particles, determined by pushing all particles down to the smallest timestep desired by any of the particles. The step size may still be variable in this case, but it is the same for all particles.

Single/double precision and data types

POSITIONS_IN_32BIT

When this is set, the internal storage of positions will be based on 32-bit unsigned integers. If single precision is used as default in the code (i.e. DOUBLEPRECISION is not set), then this is the default if none of the POSITIONS_IN_XXBIT options is selected.

POSITIONS_IN_64BIT

When this is set, the internal storage of positions will be based on 64-bit unsigned integers, otherwise on 32-bit unsigned integers. If double precision is used as default in the code (i.e. DOUBLEPRECISION is set to 1), then this is the default if none of the POSITIONS_IN_XXBIT options is selected.

POSITIONS_IN_128BIT

With this option, the internal storage of positions will be based on 128-bit unsigned integers, offering an extreme spatial dynamic range. Use of this should be only of interest in truly extreme scenarios.

DOUBLEPRECISION = 1

This makes the code store all internal particle data in double precision. Note that output files may nevertheless be written by converting the values in files to single precision, unless OUTPUT_IN_DOUBLEPRECISION is activated.

DOUBLEPRECISION_FFTW

If this is set, the code will use the double-precision version of FTTW and store the corresponding field values in real and complex space in double. Otherwise single precision is used throughout for this.

OUTPUT_IN_DOUBLEPRECISION

The output snapshot files will be written in double precision when this is enabled. This is helpful to avoid round-off errors when using snapshot files for a restart, but will rarely be required for scientific analysis, except perhaps for spatial coordinates, where many zoom simulations are insufficiently represented by single precision. Outputting in mixed precision, double precision for coordinates and single precision for everything else, can therefore be a useful option for these simulations to save storage space without sacrificing anything in the analysis.

ENLARGE_DYNAMIC_RANGE_IN_TIME

This extends the dynamic range of the integer timeline from 32 to 64 bit, i.e. the smallest possible timestep is approximately 2^64 times smaller than the simulated timespan instead of 2^32 times. Correspondingly, the number of timebins grows from 32 to 64 in this case.

IDS_32BIT

If this is set, the code uses 32-bit particle IDs, hence at most 2^32 particles may be used. This is the default setting.

IDS_48BIT

If this is set, the code uses 48-bit particle IDs, allowing some smaller fields to be packed into this variable before a long-word boundary is reached. At most 2^48 particles can be used.

IDS_64BIT

If this is set, the code uses 64-bit particle IDs.

USE_SINGLEPRECISION_INTERNALLY

If this is activated, internal computations are carried out in single precision instead of double precision. On some architectures (but not on all -- often current CPUs use the same number of cycles for a single and a double precision operation), this can make some of the computations faster, but comes with a loss of precision, of course. Use with extreme care.

NUMBER_OF_MPI_LISTENERS_PER_NODE = 1

For multi-node runs, normally one MPI rank is set aside per shared-memory node to asynchronously process incoming communication requests. If the shared-memory node has a very large number of cores, it might be helpful to have more than such communication processes, which can be set with this parameter. If the number of cores per shared memory node is larger than 64, then this in fact has to be done, see also the MAX_NUMBER_OF_RANKS_WITH_SHARED_MEMORY option.

MAX_NUMBER_OF_RANKS_WITH_SHARED_MEMORY = 64

This sets the maximum number of MPI ranks on a node that have one MPI listener assigned to them, and can access each other via shared memory. Possible values are 32 and 64, the default being 64. If the total number of cores per node divided by NUMBER_OF_MPI_LISTENERS_PER_NODE is at most 32, a setting of 32 can be used to save a small amount of memory. If the total number of cores per node divided by NUMBER_OF_MPI_LISTENERS_PER_NODE is above 64, you need to increase NUMBER_OF_MPI_LISTENERS_PER_NODE.

Output/Input options

POWERSPEC_ON_OUTPUT

Creates a power spectrum measurement for every output time, i.e. for every snapshot that is created. This is meant to be used in cosmological simulations.

REDUCE_FLUSH

The code produces relatively verbose log-file messages. To make sure that they immediately appear in the log-file, a flush statement on the output stream is executed when outputting a new log message. On slow or overloaded filesystems, this can become a significant source of overhead. To avoid this, this option can be used. A log-file flush is then only done on intervals determined via the FlushCpuTimeDiff parameter.

OUTPUT_POTENTIAL

This will force the code to compute gravitational potentials for all particles each time a snapshot file is generated. These values are then included in the snapshot files. Note that the computation of the values of the potential incurs some computational overhead.

OUTPUT_ACCELERATION

This will include the physical gravitational acceleration of each particle in the snapshot files.

OUTPUT_TIMESTEP

This outputs the timestep used by the particles. Useful only to test whether the timestep criteria behave in the intended way.

OUTPUT_PRESSURE

This outputs values for the pressure of each SPH particle (i.e. only for particles of type 0) to the snapshot files.

OUTPUT_VELOCITY_GRADIENT

This outputs the SPH estimates of the velocity gradients to the snapshot files, separately for the vx, vy, and vz components, i.e. one gets three 3-vectors.

OUTPUT_ENTROPY

When this option is activated, the code writes the values of the entropic variable associated with each SPH particle to the snapshot files.

OUTPUT_CHANGEOFENTROPY

This outputs the rate of change of entropy for each particle. Only the dissipative change due to shock heating is normally included here, meaning that radiative changes of the entropy are not included in this field, only the viscous heating from the artificial viscosity is.

OUTPUT_DIVVEL

With this option, one can request an output of the velocity divergence for each SPH particle in the snapshot files.

OUTPUT_CURLVEL

Likewise for the curl of the velocity field of all SPH particles, which is output in snapshots when this option is activated.

OUTPUT_COOLHEAT

With this option, the actual rate of energy loss due to radiative cooling (or heating) can be output to the snapshot files. This option requires COOLING to be active as well.

OUTPUT_VISCOSITY_PARAMETER

With this option, one can request an output of the viscosity parameter for each SPH particle in the snapshot files. This option requires TIMEDEP_ART_VISC to be active as well.

OUTPUT_NON_SYNCHRONIZED_ALLOWED

If this option is activated, outputs occur precisely at the prescribed desired output times, with particles being (linearly) drifted to this time, while velocities stay at the values they had after the last kick. Otherwise (which is the default), snapshots are only written at times when all particles are synchronized, and all timesteps have been completed with closing half-step velocity kicks. The desired output times are mapped to the closest full synchronization points, to the extent possible (note that if the spacing of desired output times is finer than the largest timestep size, some desired output times may have to be skipped).

OUTPUT_VELOCITIES_IN_HALF_PRECISION

Stores particle velocities in half-precision format.

OUTPUT_ACCELERATIONS_IN_HALF_PRECISION

Stores accelerations in half-precision. To prevent a potential overflow of the values, the actually stored values are normalized by the factor 10HV200, with V200 = 1000 km/sec.

OUTPUT_COORDINATES_AS_INTEGERS

Does not convert the internal integer coordinations to floating point before output, but rather outputs them in integer representation. This retains more bits of information, and may give rise to better compression possibilities if the particle data is spatially ordered.

ALLOW_HDF5_COMPRESSION

When this is enabled, certain output fields in the HDF5 output are compressed when written. The compression/decompression is done on the fly, and is transparent to the user (but implies slightly slower I/O speed).

On the fly FOF groupfinder

FOF

This switch enables the friends-of-friends group finder of the code. In this case, the code will always run the FOF group finder whenever a snapshot is produced. This is done directly before the snapshot is written to disk, allowing the particle data in the snapshot files to be arranged in group order, i.e. it is easy to read the particles of any desired group. Also, when FOF is set, the code can be applied in postprocessing mode to compute group catalogues for existing snapshots. The snapshot in question is then written a second time with a name suffix "reordered", with the particle data arrange in group order.

FOF_PRIMARY_LINK_TYPES = 2

This identifies via a bitmask the particle type or types to which the friends-of-friends linking algorithm should be applied. Conventionally these are (high-resolution) dark matter particles stored in type 1, so the default for this parameter is 2.

FOF_SECONDARY_LINK_TYPES = 1+16+32

All particles of types selected by this bitmask are looking for the nearest particle from the set covered by FOF_PRIMARY_LINK_TYPES and are then made member of the FOF group this particle belongs to. The idea is that the groups found with the plain FOF algorithm incorporate all co-spatial particles from the set described by FOF_SECONDARY_LINK_TYPES. This is useful especially for hydrodynamical cosmological simulations with radiative processes and star formation. In this case, the spatial distribution of baryons becomes very different from the dark matter, so that applying the friends-of-friends method to both dark matter and baryonic particles at the same time would yield highly distorted results. It is then much cleaner to do this only for the non-dissipative dark matter, and let these halos collect the baryons in the same halo with this option.

FOF_GROUP_MIN_LEN = 32

The minimum length a group needs to have before being stored in the group catalogue is set with this parameter. The default number is 32.

FOF_LINKLENGTH = 0.2

Dimensionless linking length for the friends of friends algorithm. The code will cast this into a comoving linking length by estimating the mean particle spacing from the dark matter density (as given by OmegaDM - OmegaBaryon) and the mean particle mass of all the particles selected with the FOF_PRIMARY_LINK_TYPES mask. It makes sense that the mass of all these particles is equal, only then the FOF algorithm is known to produce well understood results. If in doubt, check the log file for the linking length that is computed.

FOF_ALLOW_HUGE_GROUPLENGTH

Normally, the length of individual FOF groups and subhalos is restricted to reach at most 2^31 ~ 2 billion particles. If this is activated, it can be more.

Subfind

SUBFIND

When this is switched on, all identified FOF groups are subjected to a search for gravitationally bound substructures with the SUBFIND algorithm, as described in Springel et al. (2001), http://adsabs.harvard.edu/abs/2001MNRAS.328..726S. The snapshot outputs that are produced are automatically ordered in group plus subhalo order, i.e. all particles of the same group are stored subsequently, and subhalos are nested within each group, i.e. the particles are arranged in the order of the groups and subhalos.

SUBFIND_HBT

This enables an implementation of the hierarchical bound tracing algorithm, where subhalo candidates are identified with the help of a substructure catalogue from a precious time instead of doing this with density excursion sets. This option requires both SUBFIND and MERGERTREE.

SUBFIND_STORE_LOCAL_DENSITY

This will calculate local densities and velocity dispersions for all particles, not only for particles in FOFs, and store them in snapshot files.

SUBFIND_ORPHAN_TREATMENT

This produces special snapshot files after group catalogues are produced that contain only particles that have formerly been most bound particles of a subhalo. This can later be used by semi-analytic models of galaxy formation coupled to the merger tree output to treat temporarily lost or disrupted subhalos.

Merger tree algorithm

MERGERTREE

This options enables an on-the-fly calculation of descendant subhalos, which are then stored along with the group catalogues. This requires FOF and SUBFIND to be set, and is meant to be used in cosmological simulations with a large number of outputs. The merger tree information can then be used in a final postprocessing step to construct merger trees from all the group catalogues and descendant/progenitor links produced on the fly (or in postprocessing). The merger tree construction done in this way can in principle be carried out without having to store the actual particle data of the snapshot files.

On-the-fly lightcone creation

LIGHTCONE

When this option is enabled, particle data is output continuously as particles cross the backwards lightcone. The geometry of the lightcone (or of several lightcones) is described by a separate input file as specified in the parameterfile. If needed, also periodic replications of the box are used to fill the lightcone volume. Note that either LIGHTCONE_PARTICLES or LIGHTCONE_MASSMAPS, or both, need to be selected as well if this option is activated.

LIGHTCONE_PARTICLES

LIGHTCONE_OUTPUT_ACCELERATIONS

This outputs the gravitational accelerations of particles on the lightcone, which can be used for gravitational lensing applications.

LIGHTCONE_MASSMAPS

This option creates projected mass shells along the backwards lightcone, for weak lensing applications. Requires the LIGHTCONE option.

LIGHTCONE_PARTICLES_GROUPS

This option runs the FOF (and SUBFIND if enabled) group finders on the lightcone particle data before they are written to disk. Requires the LIGHTCONE and LIGHTCONE_PARTICLES options.

LIGHTCONE_IMAGE_COMP_HSML_VELDISP

This special option is only relevant for lightcone image creation, and (re)computes adaptive smoothing lengths as well as local velocity dispersions.

LIGHTCONE_MULTIPLE_ORIGINS

If this is enabled, origins of lightcones different from (0, 0, 0) can be defined. Possible origins need to be listed in a separate file with the name LightConeOriginsFile. The light cone definitions file then needs be augmented with a further number at the end of each lightcone definition, and this serves as an index into the list of lightcone origins.

REARRANGE_OPTION

This option needs to be enabled to allow the rearrange lightcone particle feature to work. It only needs to be on for the special postprocessing option, otherwise it can be disabled to save memory.

IC generation

NGENIC = 256

This master switch enables the creation of cosmological initial conditions for simulations with periodic boundary conditions. The value of NGENIC should be set to the FFT-grid size used for IC generation, which should be at least as fine as the particle resolution per dimension. If the code is started without restartflag (i.e. when normally initial conditions are read), the code instead creates the ICs first, followed by evolving them with the code, i.e. in this case the initial conditions do not need to exist on disk. One can however also start the code with restartflag 6, in which case the ICs are produced and written to disk, followed by a stop of the code. One can then also use the produced files as ICs in a regular start of GADGET-4 without having the NGENIC option set.

CREATE_GRID

If this is activated, the IC creation is carried out with a regular Cartesian particle grid that is produced on the fly. Otherwise, the unperturbed particle load is read in from the specified IC file, which allows the use of a so-called glass files (which arise from evolving random Poisson samples with the sign of gravity reversed until they settle in a quasi-equilibrium without preferred directions) or of spatially variable resolution.

GENERATE_GAS_IN_ICS

This option can be used to modify dark matter-only initial conditions for cosmological simulations upon start-up of the simulation code. The modification is to add gas to the simulation by splitting up dark matter particles into a dark matter and gas particle, with masses set by the specified cosmological parameters. The particle pair is displaced in opposite directions from the original coordinate keeping the center-of-mass fixed. The separation is such that the new dark matter and gas particles form two interleaved grids that maximize the relative distance between the two particle types and minimizes pairing correlations. The velocities of the particles inherit the velocity of the original dark matter particle. Note that here the transfer functions of dark matter and gas are not distinguished, so this procedure is only a rough approximation. It is however typically sufficient on large scales and for galaxy formation.

SPLIT_PARTICLE_TYPE = 4+8

This bitmask determines which of the dark matter particles contained in the dark matter only initial conditions that are processed with GENERATE_GAS_IN_ICS should be split up into gas and dark matter particles. Normally, this should be done for all of the dark matter particles to produce a volume filling gas phase. However, sometimes this is restricted to the high-resolution region in zoom simulations, which can then be picked out with this option.

NGENIC_FIX_MODE_AMPLITUDES

When activated, this leaves the mode amplitudes at sqrt[P(k)], instead of sampling them from a Rayleigh distribution. This can be useful in the context of the variance suppression technique of Angulo & Pontzen (2016).

NGENIC_MIRROR_PHASES

If this is activated, all phases in the created realization are turned by 180 degrees. This is useful to realize a pair of simulations that differ only by the sign of the initial density perturbations but which are otherwise identical.

NGENIC_2LPT

This option creates the initial conditions based on second-order Lagrangian perturbation theory, instead of just using the Zeldovich approximation. Especially when the starting redshift is low, this option is recommended.

NGENIC_TEST

This option is purely for testing purposes. When the code creates ICs on the fly, it just measures the power spectrum of the produced ICs and terminates.

MPI_MESSAGE_SIZELIMIT_IN_MB = 200

Some (especially older) MPI libraries are not overly stable when very large transfers are done. Such transfers can however happen in GADGET-4 for large simulations, for example in the domain decomposition. With this option one can ask the code to automatically split up such large transfers in a sequence of smaller transfers. The maximum allowed size of one of the transfers in MB is set by the value given to MPI_MESSAGE_SIZELIMIT_IN_MB.

NUMPART_PER_TASK_LARGE

Set this if the number of particles per task is quite large, in particular so large than 8 times this number can overflow a 32-bit integer. This means that once you expect ~500 million or more particles on a single MPI rank, this option needs to be set to guarantee that the PM algorithms still work correctly. Of course, once you reach more than 2 billion particles per MPI rank, the code will stop working anyhow due to integer overflows. The easy solution to this, of course, is to increase the number of MPI ranks.

ISEND_IRECV_IN_DOMAIN

This option can be used to replace the default communication pattern used in the domain decomposition (and also in FOF and SUBFIND) which is based on a hypercube with synchronous myMPI_Sendrecv() calls, with a bunch of asynchronous communications. This should be faster in principle, but it also tends to result in a huge number of simultaneously open communication requests which can also choke the MPI communication subsystem. Whether this works robustly and is indeed faster will depend on the system and the simulation size. If in doubt, rather stick with the default algorithm.

USE_MPIALLTOALLV_IN_DOMAINDECOMP

Another approach to carry out the all-to-all communication occurring in the domain decomposition is to simply use MPI's Alltoallv function. This is done when this option is set, and one then effectively hopes that the internal algorithm used by Alltoallv is the most robust and fastest for the communication task at hand. This may be the case, but there is no guarantee for it. The default algorithm of GADGET-4 (hypercube with synchronous myMPI_Sendrecv), which is used when this option is not used, should always be a reliable alternative, however.

MPI_HYPERCUBE_ALLGATHERV

Another issue with some MPI-libraries is that they may use quite a bit of internal storage for carrying out MPI_Allgatherv. If this turns out to be a problem, one can set this option. The code will then replace all uses of MPI_Allgatherv() with a simpler communication pattern that uses hypercubes with myMPI_Sendrecv as a work-around.

MPI_HYPERCUBE_ALLTOALL

Some MPI libraries tend to be unstable for their myMPI_Alltoall. This is replacing this with a robust hypercube communication pattern. Not necessarily the fastest, but very robust, scalable and with decent speed.

ALLOCATE_SHARED_MEMORY_VIA_POSIX

If this is set, try to use POSIX directly to allocated shared memory in the virtual filesystem /dev/shm, instead of relying on the MPI-3 call MPI_Win_allocate_shared() which on some systems executes in a sluggish way.

Testing and Debugging options

DEBUG

This option is only meant to enable core-dumps (which are typically disabled by calling MPI_Init() on program start-up). This can be useful to allow post-mortem analysis of a crash by loading the core file with a debugger. Of course, the code should be compiled with symbols included (-g option) to facilitate this, and it may also help to set the optimization level to something low or disable optimizations entirely to avoid confusing the debugger in some situations.

DEBUG_ENABLE_FPU_EXCEPTIONS

This option is useful in combination with DEBUG and tries to enable FPU exceptions. In this case, an illegal mathematical floating point instruction that creates a dreaded "Not a Number" (NaN) will trigger a core file. With the debugger one can then quickly find the line in the code that is the culprit.

DEBUG_SYMTENSORS

This option executes a few selected unit tests on the symmetric-tensor subroutines on start-up of the code.

HOST_MEMORY_REPORTING

This option reports, when the code starts, available system memory information by analyzing /proc/meminfo on Linux systems. It is enabled by default on Linux. Output of this option is found at the beginning of the stdout log-file, and for example looks like this:

-------------------------------------------------------------------------------------------------------------------------
AvailMem:        Largest =   29230.83 Mb (on task=  24), Smallest =   29201.10 Mb (on task=  12), Average =   29215.28 Mb
Total Mem:       Largest =   32213.01 Mb (on task=   0), Smallest =   32213.01 Mb (on task=   0), Average =   32213.01 Mb
Committed_AS:    Largest =    3011.91 Mb (on task=  12), Smallest =    2982.18 Mb (on task=  24), Average =    2997.73 Mb
SwapTotal:       Largest =   23436.99 Mb (on task=   0), Smallest =   23436.99 Mb (on task=   0), Average =   23436.99 Mb
SwapFree:        Largest =   22588.00 Mb (on task=   0), Smallest =   21600.56 Mb (on task=  12), Average =   22020.50 Mb
AllocMem:        Largest =    3011.91 Mb (on task=  12), Smallest =    2982.18 Mb (on task=  24), Average =    2997.73 Mb
-------------------------------------------------------------------------------------------------------------------------
Task=0 has the maximum committed memory and is host: sandy-022
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ENABLE_HEALTHTEST

When this is enabled, the code tries to assess upon start-up whether all CPU cores are freely available and show the same execution speed. Also, the MPI bandwidth both inside nodes and between nodes is tested.

FORCETEST = 0.001

Calculates for the specified fraction of particles direct summation forces, which can then be compared to the forces computed by the Tree/PM/FMM algorithms of GADGET-4 in order to check or monitor the force accuracy of the code. This is only included as a testing and debugging option. The value of the option should be set to a number between 0 and 1 (e.g. 0.001), and this number gives the fraction of randomly chosen particles at each timestep for which forces by direct summation are computed. The normal tree-forces and the exact direct summation forces are then collected in a file forcetest.txt for later inspection. Note that the simulation itself is unaffected by this option, but it will of course run much(!) slower, particularly if FORCETEST * NumPart * NumPart >> NumPart Note: The particle IDs must be set to numbers >= 1 for this option to work.

FORCETEST_TESTFORCELAW = 1

Special option for measuring the effective force law. Can be set to 1 or 2 for checking with TreePM/FMM-PM, or TreePM/FMM-PM + PLACEHIGHRESREGION. The option FORCETEST must be activated as well. The simulation needs to be fed with a special initial conditions file for which only one particle has mass, the others are massless test particles. The code will then go through cycles in which the particle with the mass is randomly placed, and the other particles are randomly placed around it, with distance spacing uniform in log(r). After 40 cycles are carried out, the code terminates, and the force-law accuracy can be examined by analysing the file forcetest.txt .

FORCETEST_FIXEDPARTICLESET

This always checks the same particle IDs if force accuracy is checked during a run.

VTUNE_INSTRUMENT

This option creates additional instrumentation instructions for the Intel VTune code performance tool, based on the internal timing routines. This can be used for a performance analysis based on this tool.

DEBUG_MD5

This option can be used to compute MD5 checksums of the P[] and SphP[] arrays regularly in the code, with the results being written to the log-file memory.txt. Using this, one can check for binary invariance of the code when the code is interrupted and resumed from restart files.

TILING = 2

Replicates the read-in ICs the specified number of times in each dimension. This can be used for scaling tests.

SQUASH_TEST

Squeezes the ICs on read-in on order to create a distortion from spherical symmetry in certain force calculation tests.

DOMAIN_SPECIAL_CHECK

Outputs test data to check the balancing algorithms.

EWALD_TEST

A development test for testing the accuracy of the Ewald table lookup.

RECREATE_UNIQUE_IDS

This option can be used to reinitialize the particle IDs upon start-up. Useful if one has to deal with a broken IC file.

NO_STOP_BELOW_MINTIMESTEP

Do not stop when the code wants to adopt a timestep below the specified minimum timestep, but rather enforce this step size.

DO_NOT_PRODUCE_BIG_OUTPUT

This special option allows one to refrain from writing large output files (restart files, snapshots, and group catalogues), which can be useful for scaling tests.

STOP_AFTER_STEP = 8

After the corresponding step has been completed, the simulation ends. This is meant to simplify certain performance and scalability tests.

MEASURE_TOTAL_MOMENTUM

This option computes the total conjugate momentum after every step. Can be used to check for manifest momentum conservation of different force computation schemes.

TREE_NO_SAFETY_BOX

When enabled, this disables the geometric 'near node' protection, i.e. for the one-sided tree, one may then be closer to a node's center than 1.5 times the node size, and for FMM, adjacent nodes may interact.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search