Hi Volker
Thanks for your response to my previous email about the errors encountered in outputting a lightcone file.
I note that you have updated io.cc to handle the potential 32 bit overflow issue with HDF5 files. I think that is unlikely to be the problem in my particular case as I am only attempting to save an octant from a 2048^3 particle simulation distributed over 8 snapshot files, so on average each HDF5 lightcone file should only contain about 2^30 particles. Nevertheless, I have recompiled Gadget4 with the latest version of all the source code files, using the same config options and parameter file settings, and attempted to rerun from the restart files generated up to the point of the crash. Unfortunately, this has not proved successful in that the newly compiled version crashes when loading the restart files created by the previous version, with the following log printout:
RESTART: Loading restart files...
RESTART: Loading restart files group #1 out of 4...
-------------------------- Allocated Memory Blocks---- ( Step 1202590844 )------------------
Task Nr F Variable MBytes Cumulative Function|File|Linenumber
------------------------------------------------------------------------------------------
246 0 0 GetGhostRankForSimulCommRank 0.0033 0.0033 mymalloc_init()|src/data/mymalloc.cc|137
246 1 0 GetShmRankForSimulCommRank 0.0033 0.0066 mymalloc_init()|src/data/mymalloc.cc|138
246 2 0 GetNodeIDForSimulCommRank 0.0033 0.0099 mymalloc_init()|src/data/mymalloc.cc|139
246 3 0 SharedMemBaseAddr 0.0002 0.0101 mymalloc_init()|src/data/mymalloc.cc|153
246 4 0 Cones 0.0006 0.0107 lightcone_init_geometry()|src/lightcone/lightcone.cc|564
246 5 1 slab_to_task 0.0078 0.0186 my_slab_based_fft_init()|src/pm/pm_mpi_fft.cc|45
246 6 1 slabs_x_per_task 0.0033 0.0219 my_slab_based_fft_init()|src/pm/pm_mpi_fft.cc|60
246 7 1 first_slab_x_of_task 0.0033 0.0251 my_slab_based_fft_init()|src/pm/pm_mpi_fft.cc|63
246 8 1 slabs_y_per_task 0.0033 0.0284 my_slab_based_fft_init()|src/pm/pm_mpi_fft.cc|66
246 9 1 first_slab_y_of_task 0.0033 0.0317 my_slab_based_fft_init()|src/pm/pm_mpi_fft.cc|69
------------------------------------------------------------------------------------------
Code termination on task=255, function mymalloc_movable_fullinfo(), file src/data/mymalloc.cc, line 326:
Not enough memory in mymalloc_fullinfo() to allocate 112713 MB for variable 'P' at allocate_memory()/src/io/../io/../data/simparticles.h/line 276
(FreeBytes=16000 MB).
My assumption is that some code change since the previous build (from December 2020 code version) has resulted in a change in the restart file binary format such that the newer version is no longer compatible. From your knowledge of the code changes, is that a likely explanation?
Anyway, the newly compiled version runs OK from the original IC files, and also successfully restarts from its own restart files so I am proceeding with a rerun of the simulation from the start (I appreciate that I could have done a restart from the previous snapshot files but did not want to risk any discontinuities in the simulation that might have resulted from doing this). I will let you know how this goes when it gets to the redshift corresponding to the first lightcone output.
Returning to the issue of 32 bit particle indexing, I noted an issue in the snap_io.cc file which causes the loading of IC files to fail. My specific use-case is somewhat unusual in that it involves the loading of IC files in Gadget2 format, containing 2048^3 particles. In line 758 of snap_io.cc:
#ifdef GADGET2_HEADER
for(int i = 0; i < NTYPES_HEADER; i++)
if(header.npartTotalLowWord[i] > 0)
header.npartTotal[i] = header.npartTotalLowWord[i] //+ (((long long)header.npartTotalHighWord[i]) << 32);
#endif
the expression that handles the high-order word has been, for some reason, commented out, and hence does not handle the case where npartTotal > 2^32.
The fix for my case was obviously easy enough, by removing the commenting-out of npartTotalHighWord, and also commenting-out the if() line. I can't off-hand envisage a situation where this would not work for the general case, unless the issue relates to the multiple variants of the Gadget2 header that evolved over time.
Regards
Robin
Received on 2021-08-17 16:00:40
This archive was generated by hypermail 2.3.0
: 2023-01-10 10:01:33 CET