Guide to code changes
In the following, we give a set of assorted hints and recommendations about modifying and/or extending the code. We also comment about some differences with respect to GADGET-2/3 in terms of code usage and its architecture.
Coding style guide
We strongly recommend to follow the general coding practices (even if you don't like them) and architecture of GADGET-4 in modifying or extending the code. Only then different extensions by different people have a chance to happily coexist with each other. This concerns mostly the following points:
Code extensions
-
Non-standard code extensions should always be written such that they can be switched off if not needed, and have no side effects on existing code when this is done. Normally this means that they have to be enclosed in conditional compilation precompiler statements (#ifdef), especially if variables in the global structures of the code need to be allocated for the extension. However, if the extension's execution can also be controlled at run-time by simple variables, then consider introducing a parameterfile variable to control the extension. In general, the number of symbols (and this additional
Config.sh
options) to control conditional compilation should be kept to a minimum. -
Do not place any substantial piece of code belonging to your extension into existing functions of the code. Write your own classes or functions for a code extension, and only place a function call (if needed bracketed by an #ifdef) into the appropriate place of the primary code. Also, place your extension functions into separate source files.
General code-style principles
- Code formatting: Be consistent with the code formatting of the main code, which is more or less GNU-style, and is obtained by running the code formatting tool "clang-format" of the clang compiler suite. The specific options used by GADGET-4 are defined in the hidden file ".clang-format" in the code's main directory. Running there something like
clang-format-mp-6.0 -style=file -i src/*/*
should then make sure that all your source file(s) have a consistent indention and code formatting.
-
Name functions all in lower case as a "command" that is descriptive of what the function does. Different words should normally be separated by underscores, e.g. calculate_normal_vector_for_triangle(...) For all functions, input arguments should come first, output arguments last.
-
Global variables (whose use should be kept to an absolute minimum, and the sins of GADGET-4 in this regard should not be repeated) start with an upper case character. Global variable names are nouns, with words separated by mixed lower/upper case characters, and contain no underscores. Use abbreviations that are clear, e.g.
NumForceCalculations
. If you have to use a global variable in your own code class or module, try to restrict its scope to the classes or files of your module through appropriate declarations. -
Local variables start with lowercase, and should have descriptive names, too, except for simple loop iterators and the like. Try to narrow the scope of a variable as much as possible, for example by declaring them inside a block where they are needed. Generally define them as close as possible to where they are needed for the first time. Declaration and initialization should be combined in one command where possible, e.g.
int n = get_particle_count();
instead of
int n;
...
n = get_particle_count();
-
Avoid repetition of code, i.e. do not use cut & paste to implement the same or similar functionality multiple times. This is always an indication that a new function is in order here. Break up long functions into smaller more manageable pieces.
-
Preprocessor macros that have arguments should be avoided whenever possible, and really should only be used for a compelling reason. Note that in C++ you can use inlined and type-safe functions instead. In any case, preprocessor macros should be fully capitalized.
-
Magic numbers and non-trivial numerical constants should be avoided in the actual code and instead be replaced by a symbolic constant declared within a header file, with a name all in uppercase. Consider enums instead of the use of numerical constants to distinguish between different cases.
-
All warnings emitted by the compiler upon compilation with "-Wall" should be addressed. Unless there are extremely good reasons, the code should compile without any warnings left.
-
Include consistent commenting in your code. The meaning of all global variables should be commented where they are introduced, ideally with doxygen syntax, e.g:
int MyGlobalCount; /*!< counts the number of timesteps */
- Functions should be preceded by a brief explanation of what the function does, including instruction, if any, about how the function may be used, e.g.:
/*!
* Insert the point P[i] into the partice list. Start
* the search at the current particle t.
*/
int insert_point(int i, int t)
{
...
}
- You do not need to state here how the function achieves what it does; this can be stated if appropriate in comments in the function body. There, avoid superfluous comments that just reflect what's obvious from the code anyway, like
do_domain_decomposition(); /* call domain decomposition */
- Instead, focus on comments that help one to quickly understand/check what the code tries to do at an algorithmic level. If complicated formulae are implemented, try to include in a comment a reference to the equation that is implemented.
Adding a config-option
To add a new symbolic configuration option to the code, you need to
add it to the file Template-Config.sh
besides just using it in the
code. In addition, you must add a short explanation of the option to
the file documentation/04_config-options.md
following the syntax
used for the other configuration option. Only then the checking
scripts of the source code will be happy and accept the new option.
Note that for this the option really needs to be used somewhere in the
source code that is described as belonging to GADGET-4 by the
Makefile
.
Adding parameters
To add a new parameter to the parameterfile of GADGET-4, the following steps are necessary:
-
Add the new variable to the structure
global_data_all_processes
in the filedata/allvars.h
. The variable has to be of type double, int, or a character string. -
In the file
io/parameters.c
, add a function call to add_param(...) for the new parameter in the functionregister_parameters()
. Simply follow the examples for the other parameters. If your parameter is optional and should only be active when a specific option is activated, bracket the add_param() call with a corresponding #ifdef/#endif -
Add a short explanation of the new parameter in the file
documentation/05_parameterfile.md
This is needed otherwise the code checking scripts will not accept the new parameter.
Once these steps are followed, the new parameter will be included in the parsing of the parameterfile, and will also be stored in restart files as well as in HDF5 output files.
Adding a source file
To add a new source file, create it in a subdirectory of src/
, not
in src/
itself. Preferably you create your own subdirectory for
this, say src/mymodule
Then in the Makefile
of the code, add
appropriate statements that list the file(s) and header file(s)
belonging to your new source file. For example, something like this:
SUBDIRS += mymodule
OBJS += mymodule/mysource1.o mymodule/mysource2.o
INCL += mymodule/mymodule.h
Your new source file should also come with a header file, at the very least containing a function prototype for one of your new functions that is intended to be called from the main code. This header file can then be included in the main code at an appropriate place to facilitate a call to your new code. The header file should be protected by a header file guard, i.e. you should add something like
#ifndef MYMODULE_H
#define MYMODULE_H
at the beginning of the header file, and
#endif
at the end. To tell the code checks scripts that the constant
MYMODULE_H
is not an undocumented code configuration parameter, you
also need to add it to the file defines_extra
in the main directory
of the code. Importantly, you should also add your new source file to
version control system. Ideally, you make a git-branch of GADGET-4 and
then implement your extension/modification in this branch. Try to keep
the branch current with the main line of the code as long as it is
still kept alive as a separate branch. If possible, try to integrate
your feature into a main line of development for GADGET-4, for example
through a pull request, and if successful close your branch again. To
update the html-documentation of the code with your new configuration
and parameterfile options, and to also include your source in the
cross-referenced source code documentation, run the command doxygen
in the main directory of the code. (If doxygen should be unavailable
on your local machine, you can always easily install it.)
Migration to GADGET4
Main changes in the code
There have been many major changes in the internal workings between previous versions of GADGET and GADGET-4, affecting nearly all parts of the code. In fact, basically all parts of the code have been completely rewritten since GADGET-3 (sometimes several times), or were in any case substantially revised. These modifications were done to improve the accuracy and the performance of the code, to reduce its memory consumption, and to make it applicable to a wider range of simulation types. Detailed explanations of the reasoning behind each of these modifications is beyond the scope of this short set of notes. Some of this information can be however found in the code paper on GADGET-4.
Some of the most important changes in code design include:
-
A new hierarchical timestepping option for gravity
-
An optional FMM solver for gravity
-
Availability of hybrid parallelization with shared memory use
-
Support for very large simulation sizes
-
Stretched boxes also for gravity, as well as mixed boundary conditions for gravity
-
Inclusion of FOF and SUBFIND in the public code version, a new parallelization scheme for SUBFIND, and the addition of SUBFIND_HBT
-
Inclusion of a power spectrum estimator
-
An optional column-based FFT can be used
-
Inclusion of an updated version of the N-GenIC initial conditions generator that also supports 2LPT
-
Explicit vectorization support in some parts of the code
-
New domain decomposition algorithm
-
Support of two formulations of SPH
-
On-the-fly merger tree construction
-
Continuous light-cone output
-
Integration of a simple cooling and star formation formulation in the public version of the code
Migrating old set-ups to GADGET-4
If you have used earlier version of GADGET before, you will be familiar with all the practicalities of running GADGET already. This is because nothing much has really changed here in GADGET-4. In particular, the mechanisms for starting, interrupting, and resuming a simulation are still the same.
However, as described in this guide, there have been a number of
changes in the parameterfile that controls each simulation, and there
are more compilation options that are now more conveniently controlled
through the Config.sh
file. If you want to reuse an old
parameterfile with GADGET-4, you therefore have to make a few
modifications of it. If you simply start the new code with the old
parameterfile, you will get error messages that complain about
obsolete or missing parameters. Delete or add these parameters as
needed. Also, carefully review which makefile options are still
appropriate for a given simulation. While many of the basic compile
time options available in previous versions of GADGET are still
around, others have become obsolete. The code will complain about
options that are set in Config.sh
but don't exist any more, so
you should be automatically informed about this when you try to use an
old setup. Importantly, however, the structure and format of the
snapshot files produced by the code has hardly changed (and the system
of units has not changed at all), meaning that you should still be
able to use old analysis software from previous versions of GADGET
essentially unchanged.
Notes on memory consumption of GADGET-4
GADGET-4 uses an internal memory manager where a large block of memory
is allocated once at the beginning of the simulation, and the code
then satisfies all its internal allocations out of this block. The
size of this block corresponds to the maximum allowed memory usage per
MPI process, and it is set by the parameter file MaxMemSize
.
The code will periodically output information to the file memory.txt
about all allocated memory blocks and their sizes. This happens
whenever a new high watermark in the memory allocation is reached. It
is best to check this table to get an accurate assessment of the
amount of memory the code at least needs to run successfully for a
certain particle number. For communication and particle exchanges, the
code will use buffers that automatically adjust their size to the
amount of still available memory. It is hence advantageous to set
MaxMemSize
as large as possible for the amount of physical memory
available on the compute nodes. An overcommitment of the physical
memory of a node is prevented by the code (it then exists with an
error message), but this check only works if HOST_MEMORY_REPORTING
is enabled. At the beginning of the stdout-file, you can also find the
sizes of the most important code structures for the chosen
configuration. In particular this tells about how many bytes per
collisionless particle are needed, and how many bytes are in addition
needed to store one SPH particle.
Notes on various limits in GADGET-4
-
Maximum number of particles per MPI-rank is restricted to 2^31 ~ 2 Billion
-
Maximum number of groups/subhalos per MPI-rank is restricted to 2^31 ~ 2 Billion
-
The total number of particles, and the total number of groups/subhalos, can however be much larger than this if a sufficiently large number of MPI ranks is used.
-
Maximum particle number in a single group or subhalo is for the default setting 2^31 ~ 2 billion, but this can be enlarged if needed by setting the option
FOF_ALLOW_HUGE_GROUPLENGTH
-
The maximum number of subhalos per group is restricted nevertheless to 2^31 ~ 2 billion.
-
If the legacy output formats 1/2 are used, each block in the output is restricted to 2 Gbyte in size or less. This also means that the maximum number of particles per single output file, and the maximum number of groups/subhalos per single group catalogue file can be at most of order 2 Billion, normally substantially less than that. While this restriction can be circumvented by splitting output over enough files, it provides one further motivation to use the HDF5 format!
-
Maximum number of lightcones: 256