Gadget 4 Single/double precision performance

From: Tiago Castro <tiagobscastro_at_gmail.com>
Date: Tue, 1 Dec 2020 08:15:56 +0100

Dear list,

   I am searching for the optimal configuration for running Gadget4. I am
running control DMO simulations of 500 Mpc and 512^3 particles. I am
puzzled by the following, running the code with/out
*USE_SINGLEPRECISION_INTERNALLY* (config files pasted bellow) seems not to
affect both the execution time and the memory consumption (memory.txt
pasted bellow). However, I observe a rather small suppression (0.05%) of
the matter power spectrum at z=0.0 for modes larger than unity. Is it due
to the LEAN configuration? Should LEAN configuration affect the code
accuracy as well? I warmly appreciate any clarification you can provide.

Cheers,
---------------------- SINGLE PRECISION --------------------------





























*Code was compiled with the following settings: ASMTH=1.25
   CREATE_GRID DOUBLEPRECISION=0 FMM FOF FOF_GROUP_MIN_LEN=100
   FOF_LINKLENGTH=0.2 FOF_PRIMARY_LINK_TYPES=2 HIERARCHICAL_GRAVITY
   IMPOSE_PINNING LEAN MERGERTREE MULTIPOLE_ORDER=2 NGENIC=512
   NGENIC_2LPT NSOFTCLASSES=1 NTAB=128 NTYPES=6 OUTPUT_TIMESTEP
   PERIODIC PMGRID=512 POWERSPEC_ON_OUTPUT RANDOMIZE_DOMAINCENTER
   RCUT=6.0 SELFGRAVITY SUBFIND SUBFIND_HBT
   TREE_NUM_BEFORE_NODESPLIT=4 USE_SINGLEPRECISION_INTERNALLY*

































































*MEMORY: Largest Allocation = 1559.32 Mbyte | Largest Allocation Without
Generic = 1201.79 Mbyte -------------------------- Allocated Memory
Blocks---- ( Step 0 )------------------ Task Nr F
                 Variable MBytes Cumulative
 Function|File|Linenumber
------------------------------------------------------------------------------------------
 23 0 0 GetGhostRankForSimulCommRank 0.0006
      0.0006 mymalloc_init()|src/data/mymalloc.cc|137 23 1 0
              GetShmRankForSimulCommRank 0.0006 0.0012
 mymalloc_init()|src/data/mymalloc.cc|138 23 2 0
               GetNodeIDForSimulCommRank 0.0006 0.0018
 mymalloc_init()|src/data/mymalloc.cc|139 23 3 0
                       SharedMemBaseAddr 0.0003 0.0021
 mymalloc_init()|src/data/mymalloc.cc|153 23 4 1
                            slab_to_task 0.0020 0.0041
 my_slab_based_fft_init()|src/pm/pm_mpi_fft.cc|45 23 5 1
                        slabs_x_per_task 0.0006 0.0047
 my_slab_based_fft_init()|src/pm/pm_mpi_fft.cc|60 23 6 1
                    first_slab_x_of_task 0.0006 0.0053
 my_slab_based_fft_init()|src/pm/pm_mpi_fft.cc|63 23 7 1
                        slabs_y_per_task 0.0006 0.0059
 my_slab_based_fft_init()|src/pm/pm_mpi_fft.cc|66 23 8 1
                    first_slab_y_of_task 0.0006 0.0065
 my_slab_based_fft_init()|src/pm/pm_mpi_fft.cc|69 23 9 1
                                       P 175.0443 175.0508
 allocate_memory()|src/ngenic/../main/../data/simparticles|273 23 10 1
                                    SphP 0.0001 175.0509
 allocate_memory()|src/ngenic/../main/../data/simparticles|274 23 11 1
                      FirstTopleafOfTask 0.0006 175.0515
 domain_allocate()|src/domain/domain.cc|163 23 12 1
                        NumTopleafOfTask 0.0006 175.0521
 domain_allocate()|src/domain/domain.cc|164 23 13 1
                                TopNodes 0.0358 175.0879
 domain_allocate()|src/domain/domain.cc|165 23 14 1
                              TaskOfLeaf 0.0156 175.1035
 domain_allocate()|src/domain/domain.cc|166 23 15 1
                         ListOfTopleaves 0.0156 175.1191
 domain_decomposition()|src/domain/domain.cc|118 23 16 1
                                      PS 87.5222 262.6413
 create_snapshot_if_desired()|src/main/run.cc|534 23 17 0
                                   MinID 3.5000 266.1413
 fof_fof()|src/fof/fof.cc|71 23 18 0
                               MinIDTask 3.5000 269.6413
 fof_fof()|src/fof/fof.cc|72 23 19 0
                                    Head 3.5000 273.1413
 fof_fof()|src/fof/fof.cc|73 23 20 0
                                    Next 3.5000 276.6413
 fof_fof()|src/fof/fof.cc|74 23 21 0
                                    Tail 3.5000 280.1413
 fof_fof()|src/fof/fof.cc|75 23 22 0
                                     Len 3.5000 283.6413
 fof_fof()|src/fof/fof.cc|76 23 23 1
                              Send_count 0.0006 283.6419
 treeallocate()|src/tree/tree.cc|794 23 24 1
                             Send_offset 0.0006 283.6425
 treeallocate()|src/tree/tree.cc|795 23 25 1
                              Recv_count 0.0006 283.6431
 treeallocate()|src/tree/tree.cc|796 23 26 1
                             Recv_offset 0.0006 283.6437
 treeallocate()|src/tree/tree.cc|797 23 27 0
                       TreeNodes_offsets 0.0003 283.6440
 treeallocate()|src/tree/tree.cc|824 23 28 0
                      TreePoints_offsets 0.0003 283.6443
 treeallocate()|src/tree/tree.cc|825 23 29 0
                    TreeNextnode_offsets 0.0003 283.6447
 treeallocate()|src/tree/tree.cc|826 23 30 0
               TreeForeign_Nodes_offsets 0.0003 283.6450
 treeallocate()|src/tree/tree.cc|827 23 31 0
              TreeForeign_Points_offsets 0.0003 283.6453
 treeallocate()|src/tree/tree.cc|828 23 32 0
                           TreeP_offsets 0.0003 283.6456
 treeallocate()|src/tree/tree.cc|829 23 33 0
                        TreeSphP_offsets 0.0003 283.6459
 treeallocate()|src/tree/tree.cc|830 23 34 0
                          TreePS_offsets 0.0003 283.6462
 treeallocate()|src/tree/tree.cc|831 23 35 0
                   TreeSharedMemBaseAddr 0.0003 283.6465
 treeallocate()|src/tree/tree.cc|833 23 36 1
                                   Nodes 15.3964 299.0428
 treeallocate()|src/tree/tree.cc|882 23 37 1
                                  Points 0.0001 299.0429
 treebuild_construct()|src/tree/tree.cc|311 23 38 1
                                Nextnode 3.5167 302.5596
 treebuild_construct()|src/tree/tree.cc|312 23 39 1
                                  Father 3.5010 306.0606
 treebuild_construct()|src/tree/tree.cc|313 23 40 0
                                   Flags 0.8750 306.9356
 fof_find_groups()|src/fof/fof_findgroups.cc|127 23 41 0
                   FullyLinkedNodePIndex 0.5178 307.4534
 fof_find_groups()|src/fof/fof_findgroups.cc|129 23 42 0
                              targetlist 3.5000 310.9534
 fof_find_groups()|src/fof/fof_findgroups.cc|163 23 43 0
                              Exportflag 0.0006 310.9540
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|593 23
   44 0 Exportindex 0.0006 310.9546
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|594 23
   45 0 Exportnodecount 0.0006 310.9552
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|595 23
   46 0 Send 0.0012 310.9564
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|597 23
   47 0 Recv 0.0012 310.9576
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|598 23
   48 0 Send_count 0.0006 310.9583
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|600 23
   49 0 Send_offset 0.0006 310.9589
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|601 23
   50 0 Recv_count 0.0006 310.9595
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|602 23
   51 0 Recv_offset 0.0006 310.9601
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|603 23
   52 0 Send_count_nodes 0.0006 310.9607
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|605 23
   53 0 Send_offset_nodes 0.0006 310.9613
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|606 23
   54 0 Recv_count_nodes 0.0006 310.9619
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|607 23
   55 0 Recv_offset_nodes 0.0006 310.9625
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|608 23
   56 1 PartList 1241.0233 1551.9858
 src/fof/../mpi_utils/generic_comm.h|198generic_alloc_partlist_nodelist_ngblist()|src/fof/../mpi_utils/generic_comm.h|244
 23 57 1 Ngblist 3.5000
   1555.4858
 src/fof/../mpi_utils/generic_comm.h|198generic_alloc_partlist_nodelist_ngblist()|src/fof/../mpi_utils/generic_comm.h|247
 23 58 1 Shmranklist 3.5000
   1558.9858
 src/fof/../mpi_utils/generic_comm.h|198generic_alloc_partlist_nodelist_ngblist()|src/fof/../mpi_utils/generic_comm.h|248
 23 59 1 DataIn 0.0001
   1558.9859
 src/fof/../mpi_utils/generic_comm.h|198generic_exchange()|src/fof/../mpi_utils/generic_comm.h|556*



* 23 61 1 DataOut 0.0001
   1558.9860
 src/fof/../mpi_utils/generic_comm.h|198generic_exchange()|src/fof/../mpi_utils/generic_comm.h|558
 23 62 0 rel_node_index 0.0006
   1558.9866
 src/fof/../mpi_utils/generic_comm.h|198generic_prepare_particle_data_for_expor()|src/fof/../mpi_utils/generic_comm.h|317
------------------------------------------------------------------------------------------*

*---------------------- DOUBLE PRECISION --------------------------*































*Code was compiled with the following settings: ASMTH=1.25
   CREATE_GRID DOUBLEPRECISION=1 FMM FOF FOF_GROUP_MIN_LEN=100
   FOF_LINKLENGTH=0.2 FOF_PRIMARY_LINK_TYPES=2 GADGET2_HEADER
   HIERARCHICAL_GRAVITY IMPOSE_PINNING LEAN MERGERTREE
   MULTIPOLE_ORDER=2 NGENIC=512 NGENIC_2LPT NSOFTCLASSES=1
   NTAB=128 NTYPES=6 OUTPUT_TIMESTEP PERIODIC PMGRID=512
   POWERSPEC_ON_OUTPUT RANDOMIZE_DOMAINCENTER RCUT=6.0 SELFGRAVITY
   SUBFIND SUBFIND_HBT TREE_NUM_BEFORE_NODESPLIT=4 *

































































*MEMORY: Largest Allocation = 1559.32 Mbyte | Largest Allocation Without
Generic = 1202.39 Mbyte -------------------------- Allocated Memory
Blocks---- ( Step 0 )------------------ Task Nr F
                 Variable MBytes Cumulative
 Function|File|Linenumber
------------------------------------------------------------------------------------------
  8 0 0 GetGhostRankForSimulCommRank 0.0006
      0.0006 mymalloc_init()|src/data/mymalloc.cc|137 8 1 0
              GetShmRankForSimulCommRank 0.0006 0.0012
 mymalloc_init()|src/data/mymalloc.cc|138 8 2 0
               GetNodeIDForSimulCommRank 0.0006 0.0018
 mymalloc_init()|src/data/mymalloc.cc|139 8 3 0
                       SharedMemBaseAddr 0.0003 0.0021
 mymalloc_init()|src/data/mymalloc.cc|153 8 4 1
                            slab_to_task 0.0020 0.0041
 my_slab_based_fft_init()|src/pm/pm_mpi_fft.cc|45 8 5 1
                        slabs_x_per_task 0.0006 0.0047
 my_slab_based_fft_init()|src/pm/pm_mpi_fft.cc|60 8 6 1
                    first_slab_x_of_task 0.0006 0.0053
 my_slab_based_fft_init()|src/pm/pm_mpi_fft.cc|63 8 7 1
                        slabs_y_per_task 0.0006 0.0059
 my_slab_based_fft_init()|src/pm/pm_mpi_fft.cc|66 8 8 1
                    first_slab_y_of_task 0.0006 0.0065
 my_slab_based_fft_init()|src/pm/pm_mpi_fft.cc|69 8 9 1
                                       P 175.0443 175.0508
 allocate_memory()|src/ngenic/../main/../data/simparticles|273 8 10 1
                                    SphP 0.0001 175.0509
 allocate_memory()|src/ngenic/../main/../data/simparticles|274 8 11 1
                      FirstTopleafOfTask 0.0006 175.0515
 domain_allocate()|src/domain/domain.cc|163 8 12 1
                        NumTopleafOfTask 0.0006 175.0521
 domain_allocate()|src/domain/domain.cc|164 8 13 1
                                TopNodes 0.0358 175.0879
 domain_allocate()|src/domain/domain.cc|165 8 14 1
                              TaskOfLeaf 0.0156 175.1035
 domain_allocate()|src/domain/domain.cc|166 8 15 1
                         ListOfTopleaves 0.0156 175.1191
 domain_decomposition()|src/domain/domain.cc|118 8 16 1
                                      PS 87.5222 262.6413
 create_snapshot_if_desired()|src/main/run.cc|534 8 17 0
                                   MinID 3.5000 266.1413
 fof_fof()|src/fof/fof.cc|71 8 18 0
                               MinIDTask 3.5000 269.6413
 fof_fof()|src/fof/fof.cc|72 8 19 0
                                    Head 3.5000 273.1413
 fof_fof()|src/fof/fof.cc|73 8 20 0
                                    Next 3.5000 276.6413
 fof_fof()|src/fof/fof.cc|74 8 21 0
                                    Tail 3.5000 280.1413
 fof_fof()|src/fof/fof.cc|75 8 22 0
                                     Len 3.5000 283.6413
 fof_fof()|src/fof/fof.cc|76 8 23 1
                              Send_count 0.0006 283.6419
 treeallocate()|src/tree/tree.cc|794 8 24 1
                             Send_offset 0.0006 283.6425
 treeallocate()|src/tree/tree.cc|795 8 25 1
                              Recv_count 0.0006 283.6431
 treeallocate()|src/tree/tree.cc|796 8 26 1
                             Recv_offset 0.0006 283.6437
 treeallocate()|src/tree/tree.cc|797 8 27 0
                       TreeNodes_offsets 0.0003 283.6440
 treeallocate()|src/tree/tree.cc|824 8 28 0
                      TreePoints_offsets 0.0003 283.6443
 treeallocate()|src/tree/tree.cc|825 8 29 0
                    TreeNextnode_offsets 0.0003 283.6447
 treeallocate()|src/tree/tree.cc|826 8 30 0
               TreeForeign_Nodes_offsets 0.0003 283.6450
 treeallocate()|src/tree/tree.cc|827 8 31 0
              TreeForeign_Points_offsets 0.0003 283.6453
 treeallocate()|src/tree/tree.cc|828 8 32 0
                           TreeP_offsets 0.0003 283.6456
 treeallocate()|src/tree/tree.cc|829 8 33 0
                        TreeSphP_offsets 0.0003 283.6459
 treeallocate()|src/tree/tree.cc|830 8 34 0
                          TreePS_offsets 0.0003 283.6462
 treeallocate()|src/tree/tree.cc|831 8 35 0
                   TreeSharedMemBaseAddr 0.0003 283.6465
 treeallocate()|src/tree/tree.cc|833 8 36 1
                                   Nodes 15.3964 299.0428
 treeallocate()|src/tree/tree.cc|882 8 37 1
                                  Points 0.0001 299.0429
 treebuild_construct()|src/tree/tree.cc|311 8 38 1
                                Nextnode 3.5167 302.5596
 treebuild_construct()|src/tree/tree.cc|312 8 39 1
                                  Father 3.5010 306.0606
 treebuild_construct()|src/tree/tree.cc|313 8 40 0
                                   Flags 0.8750 306.9356
 fof_find_groups()|src/fof/fof_findgroups.cc|127 8 41 0
                   FullyLinkedNodePIndex 0.5178 307.4534
 fof_find_groups()|src/fof/fof_findgroups.cc|129 8 42 0
                              targetlist 3.5000 310.9534
 fof_find_groups()|src/fof/fof_findgroups.cc|163 8 43 0
                              Exportflag 0.0006 310.9540
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|593 8
   44 0 Exportindex 0.0006 310.9546
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|594 8
   45 0 Exportnodecount 0.0006 310.9552
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|595 8
   46 0 Send 0.0012 310.9564
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|597 8
   47 0 Recv 0.0012 310.9576
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|598 8
   48 0 Send_count 0.0006 310.9583
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|600 8
   49 0 Send_offset 0.0006 310.9589
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|601 8
   50 0 Recv_count 0.0006 310.9595
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|602 8
   51 0 Recv_offset 0.0006 310.9601
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|603 8
   52 0 Send_count_nodes 0.0006 310.9607
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|605 8
   53 0 Send_offset_nodes 0.0006 310.9613
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|606 8
   54 0 Recv_count_nodes 0.0006 310.9619
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|607 8
   55 0 Recv_offset_nodes 0.0006 310.9625
 generic_allocate_comm_tables()|src/fof/../mpi_utils/generic_comm.h|608 8
   56 1 PartList 1241.0233 1551.9858
 src/fof/../mpi_utils/generic_comm.h|198generic_alloc_partlist_nodelist_ngblist()|src/fof/../mpi_utils/generic_comm.h|244
  8 57 1 Ngblist 3.5000
   1555.4858
 src/fof/../mpi_utils/generic_comm.h|198generic_alloc_partlist_nodelist_ngblist()|src/fof/../mpi_utils/generic_comm.h|247
  8 58 1 Shmranklist 3.5000
   1558.9858
 src/fof/../mpi_utils/generic_comm.h|198generic_alloc_partlist_nodelist_ngblist()|src/fof/../mpi_utils/generic_comm.h|248
  8 59 1 DataIn 0.0001
   1558.9859
 src/fof/../mpi_utils/generic_comm.h|198generic_exchange()|src/fof/../mpi_utils/generic_comm.h|556
  8 60 1 NodeInfoIn 0.0001
   1558.9860
 src/fof/../mpi_utils/generic_comm.h|198generic_exchange()|src/fof/../mpi_utils/generic_comm.h|557*




* 8 61 1 DataOut 0.0001
   1558.9860
 src/fof/../mpi_utils/generic_comm.h|198generic_exchange()|src/fof/../mpi_utils/generic_comm.h|558
  8 62 0 rel_node_index 0.0006
   1558.9866
 src/fof/../mpi_utils/generic_comm.h|198generic_prepare_particle_data_for_expor()|src/fof/../mpi_utils/generic_comm.h|317
------------------------------------------------------------------------------------------
*
*Tiago Castro* Post Doc, Department of Physics / UNITS / OATS
Phone: *(* <%28+39%29%20327%20498%200157>*+39 040 3199 120) *
<%28+39%29%20327%20498%200157>
Mobile: *(* <%28+39%29%20327%20498%200157>*+39 388 794 1562) *
<%28+39%29%20327%20498%200157>
Email: *tiagobscastro_at_gmail.com* <tiagobscastro_at_gmail.com>
Website: *tiagobscastro.com <http://tiagobscastro.com>*
<http://sites.if.ufrj.br/castro/en>
Skype: *tiagobscastro* <https://webapp.wisestamp.com/#>
Address:
*Osservatorio Astronomico di Trieste / Villa BazzoniVia Bazzoni, *
*2, 34143 Trieste TS* [image: photo]
<http://ws-promos.appspot.com/r?rdata=eyJydXJsIjogImh0dHA6Ly93d3cud2lzZXN0YW1wLmNvbS9lbWFpbC1pbnN0YWxsP3dzX25jaWQ9NjcyMjk0MDA4JnV0bV9zb3VyY2U9ZXh0ZW5zaW9uJnV0bV9tZWRpdW09ZW1haWwmdXRtX2NhbXBhaWduPXByb21vXzU3MzI1Njg1NDg3Njk3OTIiLCAiZSI6ICI1NzMyNTY4NTQ4NzY5NzkyIn0=&u=754281802009791>
Received on 2020-12-01 08:16:24

This archive was generated by hypermail 2.3.0 : 2022-09-01 14:03:43 CEST