Re: Hangup when generating initial conditions

From: Ken Osato <ken.osato_at_yukawa.kyoto-u.ac.jp>
Date: Fri, 18 Feb 2022 15:45:19 +0900

Dear Volker,

Thank you for your great help. I've tested with ENABLE_HEALTHTEST but
the error still persists without any additional MPI error messages...
As you suggested, I should switch to OpenMPI for running Gadget-4.
I'll keep you updated once the test run is successful for OpenMPI or the
latest version of Intel MPI, which will be available soon in my cluster.

Best regards,
Ken


On 2022/02/13 2:36, Volker Springel wrote:
>
> Dear Ken,
>
> This looks like another instability of the IntelMPI on your system... I think this is unrelated to Gadget4, and I cannot reproduce it. (Btw: Since you used the ALLOCATE_SHARED_MEMORY_VIA_POSIX option, the code needs to be linked with "-lrt", which was missing in the Makefile. I have added this now, and I suppose you did this too.).
>
> I'd recommend to also enable the option ENABLE_HEALTHTEST, which gives some additional start-up and MPI performance test at the beginning.
>
> If the problem persists, I'd highly recommend to try another MPI library. OpenMPI is best for this, as it gives more control about what's going on, and tends to be pretty robust according to my experience. Also, I would recommend to compile OpenMPI yourself. Here are some detailed instructions for doing this:
>
>
> 1) Download and compile the latest stable version of UCX, and install it in your user space
>
> Get it from https://openucx.org/downloads/
> for example: "wget https://github.com/openucx/ucx/releases/download/v1.12.0/ucx-1.12.0.tar.gz"
>
> Unpack: "tar -zxvf ucx-1.12.0.tar.gz"
> Configure it: "./configure --prefix=/u/vrs/Libs/ucx-1.12.0"
> (replace /u/vrs/Libs with a place in your home directory)
> Build it: "make"
> Install it: "make install"
>
>
> 2) If you have an Omnipath network, it doesn't hurt to get the latest version of the PSM2 library, from
> https://github.com/cornelisnetworks/opa-psm2
> for example "wget https://github.com/cornelisnetworks/opa-psm2/archive/refs/tags/PSM2_11.2.206.tar.gz"
>
> After unpacking: "make"
> "make DESTDIR=/u/vrs/Libs/psm2-11.2.206 install"
>
>
> 3) Download and compile OpenMPI, install in your user space
>
> Get it from https://www.open-mpi.org/
> for example: "wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.7.tar.gz"
>
> Unpack: "tar -zxvf openmpi-4.0.7.tar.gz"
> Configure it: "./configure --prefix=/u/vrs/Libs/openmpi-4.0.7 --without-verbs --with-ucx=/u/vrs/Libs/ucx-1.12.0"
> and if you have Omnipath and your own PSM2 library, you would add here
> "--with-psm2=/u/vrs/Libs/psm2-11.2.206/usr"
> and of course, you give the path-name of the install directory of ucx from above
> (I note that on our Omnipath cluster, the OpenMPI-4.1 series tends to have PSM2-related start-up problems
> which are not there for the OpenMPI-4.0 series. Not clear to me whether this may also happen on other systems.
> I therefore prefer 4.0.7 over 4.1.2. For clusters with Mellanox Infiniband, I had no problems with OpenMPI-4.1.)
> Build it: "make"
> Install it: "make install"
>
>
> 4) Make sure that you compile GADGET4 with OpenMPI, and that you use it to launch the code using OpenMPI's
> launch command.
>
> Either define a 'buildsystem/Makefile.comp.openmpi' file in which you set the compiler for Gadget4 as
>
> CPP = /u/vrs/Libs/openmpi-4.1.2/bin/mpicxx -std=c++11 # sets the C++-compiler
>
> and then launch in your job-script the code with
>
> /u/vrs/Libs/openmpi-4.0.7/bin/mpiexec -np $SLURM_NPROCS ./Gadget4 param.txt
>
> or make mpicxx/mpiexec defaults by adding something like
>
> export PATH=/u/vrs/Libs/openmpi-4.0.7/bin:$PATH
>
> in your ~/.bashrc file. Then you can compile simply with "mpicxx" and start with "mpiexec".
> Make sure that you login into a new shell after these changes, and do a "make clean" and recompile in Gadget4
> before you try all this.
>
> In a job-script, it is a good idea to include the commands
> which mpiexec
> ldd ./Gadget4
> to verify that you really use the OpenMPI libraries, and its mpiexec-command, and not draw from another MPI
> library installed on your system.
>
>
> Regards,
> Volker
>
>
>
>
>> On 10. Feb 2022, at 15:03, Ken Osato <ken.osato_at_yukawa.kyoto-u.ac.jp> wrote:
>>
>>
>> Dear Volker,
>>
>> Thank you for your prompt help.
>> Following your advice, I reran the simulation with options MPI_HYPERCUBE_ALLGATHERV, MPI_HYPERCUBE_ALLTOALL, MPI_MESSAGE_SIZELIMIT_IN_MB = 100, and ALLOCATE_SHARED_MEMORY_VIA_POSIX. (I also switched on PRESERVE_SHMEM_BINARY_INVARIANCE and DEBUG options.)
>>
>> For this run, the Ewald table computation does not raise any errors and memory allocation takes less than a second.
>>> MALLOC: Allocation of shared memory took 0.002775 sec
>> However, the code halted in initial condition. The stdout log file reads
>>> EWALD: reading Ewald tables from file `ewald_table_1-1-1_64-64-64_precision8-order3.dat'
>>> EWALD: Initialization of periodic boundaries finished.
>>> NGENIC: generated grid of size 2048
>>> NGENIC: computing displacement fields...
>>> NGENIC: vel_prefac1= 3557.04 hubble_a=28456.5 fom1=0.999995
>>> NGENIC: vel_prefac2= 7114.08 hubble_a=28456.5 fom2=1.99999
>>> NGENIC: Dplus=50.1933
>> and MPI error is output in the stderr log file.
>>> Abort(471456271) on node 226 (rank 226 in comm 0): Fatal error in PMPI_Sendrecv: Other MPI error, error stack:
>>> PMPI_Sendrecv(249)...............: MPI_Sendrecv(sbuf=0x14a1b9c480e8, scount=8, MPI_BYTE, dest=1493, stag=1303, rbuf=0x14a1b9c4eee8, rcount=8, MPI_BYTE, src=1493, rtag=1303, comm=0x84000004, status=0x1) failed
>>> MPID_Isend(830)..................:
>>> MPIDI_isend_unsafe(334)..........:
>>> MPIDI_OFI_inject_handler_vci(671): OFI tagged inject failed (ofi_impl.h:671:MPIDI_OFI_inject_handler_vci:No route to host)
>> There seems to be something wrong with MPI communication (probably in 2LPT calculation part).
>> I'd appreciate your help. Thank you.
>>
>> Best regards,
>> Ken
>>
>>
>> On 2022/02/09 0:49, Volker Springel wrote:
>>> Dear Ken,
>>>
>>> Sorry, yes, the MPI_HYPERCUBE_ALLGATHERV created a problem because in the Ewald table module I had at some point introduced the MPI_IN_PLACE option in the call of MPI_Allgatherv(), but my own version of MPI_Allgatherv() wasn't setup for MPI_IN_PLACE yet.
>>>
>>> I have changed this now, i.e. if you update the code, MPI_HYPERCUBE_ALLGATHERV should not create a hang any more. While at it, I have also cleaned up how the native MPI_Allgatherv() is wrapped with a custom version in the code, and likewise I introduce such an option for MPI_Alltoall() as well, through the new MPI_HYPERCUBE_ALLTOALL option. Also, I enforced the message size limit in MPI_Sendrecv more consistently in all relevant places.
>>>
>>> For maximum MPI-stability, one should then set MPI_HYPERCUBE_ALLGATHERV, MPI_HYPERCUBE_ALLTOALL, and impose a MPI_MESSAGE_SIZELIMIT_IN_MB = 100 or lower. But one should not activate USE_MPIALLTOALLV_IN_DOMAINDECOMP and ISEND_IRECV_IN_DOMAIN.
>>>
>>> I hope this helps. I note that in your original log-message, I also noted the suspicious line
>>>
>>> MALLOC: Allocation of shared memory took 187.053 sec
>>>
>>> There is really no reason why the allocation of the shared memory should take that long (it should take fractions of a second)... I have experienced this phenomenon with some MPICH-based MPI-libraries in combination with older linux kernels. I have no clue why this happens, except that it shouldn't.
>>>
>>> As a work-around for this, you can try to activate the new option ALLOCATE_SHARED_MEMORY_VIA_POSIX.
>>>
>>> Best,
>>> Volker
>>>
>>>> On 7. Feb 2022, at 13:03, Ken Osato <ken.osato_at_yukawa.kyoto-u.ac.jp> wrote:
>>>>
>>>>
>>>> Dear Volker,
>>>>
>>>> Thank you for your help. I tried running with "MPI_HYPERCUBE_ALLGATHERV" but this time the run failed in Ewald table module. I've attached the error log for this run below.
>>>> I also tried switching on "USE_MPIALLTOALLV_IN_DOMAINDECOMP" or "ISEND_IRECV_IN_DOMAIN" but for both runs, the code failed due to similar errors.
>>>> In all the runs above, I reduced the MPI size limit, i.e., MPI_MESSAGE_SIZELIMIT_IN_MB = 100, to avoid large communications.
>>>>
>>>> Best regards,
>>>> Ken
>>>>
>>>>> ==== backtrace (tid: 459742) ====
>>>>> 0 0x0000000000012b20 .annobin_sigaction.c() sigaction.c:0
>>>>> 1 0x000000000085d219 I_MPI_memcpy_movsb() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/i_mpi_memcpy_sse.h:11
>>>>> 2 0x000000000085d219 bdw_memcpy_write() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_memcpy.h:162
>>>>> 3 0x000000000085c554 write_to_frame() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_memcpy.h:478
>>>>> 4 0x000000000085c554 send_frame() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_send.h:1212
>>>>> 5 0x0000000000853833 MPIDI_POSIX_eager_send() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_send.h:1543
>>>>> 6 0x0000000000755399 MPIDI_POSIX_eager_send() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/posix_eager_impl.h:37
>>>>> 7 0x0000000000755399 MPIDI_POSIX_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/../posix/posix_am.h:220
>>>>> 8 0x0000000000755399 MPIDI_SHM_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/shm_am.h:49
>>>>> 9 0x0000000000755399 MPIDIG_isend_impl() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:116
>>>>> 10 0x000000000075870e MPIDIG_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:172
>>>>> 11 0x000000000075870e MPIDIG_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:233
>>>>> 12 0x000000000075870e MPIDI_POSIX_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/../posix/posix_send.h:59
>>>>> 13 0x000000000075870e MPIDI_SHM_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/shm_p2p.h:187
>>>>> 14 0x000000000075870e MPIDI_isend_unsafe() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:314
>>>>> 15 0x000000000075870e MPIDI_isend_safe() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:609
>>>>> 16 0x000000000075870e MPID_Isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:828
>>>>> 17 0x000000000075870e PMPI_Sendrecv() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpi/pt2pt/sendrecv.c:181
>>>>> 18 0x000000000044fa28 MPI_hypercube_Allgatherv() /home/uchu/ken.osato/Gadget-4/src/mpi_utils/hypercube_allgatherv.cc:47
>>>>> 19 0x000000000047d12f ewald::ewald_init() /home/uchu/ken.osato/Gadget-4/src/gravity/ewald.cc:208
>>>>> 20 0x0000000000405347 sim::begrun1() /home/uchu/ken.osato/Gadget-4/src/main/begrun.cc:222
>>>>> 21 0x000000000040f0f3 main() /home/uchu/ken.osato/Gadget-4/src/main/main.cc:220
>>>>> 22 0x0000000000023493 __libc_start_main() ???:0
>>>>> 23 0x0000000000404cae _start() ???:0
>>>> On 06/02/2022 19:22, Volker Springel wrote:
>>>>> Dear Keto,
>>>>>
>>>>> This looks very much like another MPI instability, I'm afraid, in a call of the native version of MPI_Allgatherv() of Intel MPI. This is one of the most general and complex collective communication calls... my experience is that many MPI libraries are not always stable for it (depending on the size of the transfer, the network stack, the phase of the moon, etc.), presumably due to their aggressive attempts to optimize execution time.
>>>>>
>>>>> This is also why there is the switch
>>>>>
>>>>> MPI_HYPERCUBE_ALLGATHERV
>>>>>
>>>>> in the code, which will replace the native MPI_Allgatherv() call with my own simple hypercupe algorithm based on MPI_Sendrecv(). I would suggest to switch this on and try again.
>>>>>
>>>>> Best regards,
>>>>> Volker
>>>>>
>>>>> ps: The hang you experienced should not be affected by any change since Dec 23, so it is probably not reproducable in detail, which would be again consistent with a flaky implementation of MPI_Allgatherv() in the library.
>>>>>
>>>>>> On 27. Jan 2022, at 13:38, Ken Osato <ken.osato_at_yukawa.kyoto-u.ac.jp> wrote:
>>>>>>
>>>>>> Dear Gadget users,
>>>>>>
>>>>>> Actually, I had a similar problem raised by Julianne, related to the routine shared_memory_handler(), when running gravity-only simulations with Gadget-4.
>>>>>> The error seems to occur for MPICH-based libraries since I'm also using Intel MPI (v. 2020.4.304) on our cluster.
>>>>>>
>>>>>> Volker has already fixed this issue and I've run the simulation in order to test the code in my environment.
>>>>>> First, I've run the simulation with 1024^3 particles and the run is successful without errors.
>>>>>> However, when I increase the number of particles to 2048^3, it hangs up in generating initial conditions.
>>>>>> This error occurs for both of analytic calculations (PowerSpectrumType=1) and loading table (PowerSpectrumType=2).
>>>>>> I attach the log files for this run.
>>>>>> When I ran the simulation with older version of Gadget-4 (Git commit b4bb065ce3dec478d2a2d7101cefc5f5faade084, Wed Dec 23 17:05:02 2020 +0100), there was no error for initial conditions.
>>>>>> I think the current error again might be related to the different implementation between OpenMPI and MPICH.
>>>>>>
>>>>>> There is also a quite minor error about finalization. I always find the following error message every time the job ends.
>>>>>>> Abort(806969615) on node 185 (rank 185 in comm 0): Fatal error in PMPI_Finalize: Other MPI error, error stack:
>>>>>>> PMPI_Finalize(214)...............: MPI_Finalize failed
>>>>>>> PMPI_Finalize(159)...............:
>>>>>>> MPID_Finalize(1288)..............:
>>>>>>> MPIDI_OFI_mpi_finalize_hook(1892): OFI domain close failed (ofi_init.c:1892:MPIDI_OFI_mpi_finalize_hook:Device or resource busy)
>>>>>> But it seems that the job successfully finishes, the log file (stdout) ends with "endrun called, calling MPI_Finalize() bye!".
>>>>>> And I found no errors in output snapshots. Probably, it might be also due to MPI libraries.
>>>>>>
>>>>>> Best regards,
>>>>>> Ken
>>>>>>
>>>>>> --
>>>>>> Ken Osato
>>>>>> Yukawa Institute for Theoretical Physics, Kyoto University
>>>>>> Kitashirakawa Oiwakecho, Sakyo-ku, Kyoto 606-8502, Japan
>>>>>> Tel: +81-75-753-7000
>>>>>> E-mail: ken.osato_at_yukawa.kyoto-u.ac.jp
>>>>>> <slurm-151171.out><log.txt>
>>>>>> -----------------------------------------------------------
>>>>>>
>>>>>> If you wish to unsubscribe from this mailing, send mail to
>>>>>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
>>>>>> A web-archive of this mailing list is available here:
>>>>>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>>>>>
>>>>> -----------------------------------------------------------
>>>>>
>>>>> If you wish to unsubscribe from this mailing, send mail to
>>>>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
>>>>> A web-archive of this mailing list is available here:
>>>>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>>>> --
>>>> Ken Osato
>>>> Yukawa Institute for Theoretical Physics, Kyoto University
>>>> Kitashirakawa Oiwakecho, Sakyo-ku, Kyoto 606-8502, Japan
>>>> Tel: +81-75-753-7000
>>>> E-mail: ken.osato_at_yukawa.kyoto-u.ac.jp
>>>>
>>>>
>>>>
>>>>
>>>> -----------------------------------------------------------
>>>> If you wish to unsubscribe from this mailing, send mail to
>>>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
>>>> A web-archive of this mailing list is available here:
>>>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>>>
>>>
>>> -----------------------------------------------------------
>>>
>>> If you wish to unsubscribe from this mailing, send mail to
>>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
>>> A web-archive of this mailing list is available here:
>>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>> --
>> Ken Osato
>> Yukawa Institute for Theoretical Physics, Kyoto University
>> Kitashirakawa Oiwakecho, Sakyo-ku, Kyoto 606-8502, Japan
>> Tel: +81-75-753-7000
>> E-mail: ken.osato_at_yukawa.kyoto-u.ac.jp
>>
>>
>>
>>
>> -----------------------------------------------------------
>> If you wish to unsubscribe from this mailing, send mail to
>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
>> A web-archive of this mailing list is available here:
>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list

-- 
Ken Osato
Yukawa Institute for Theoretical Physics, Kyoto University
Kitashirakawa Oiwakecho, Sakyo-ku, Kyoto 606-8502, Japan
Tel: +81-75-753-7000
E-mail: ken.osato_at_yukawa.kyoto-u.ac.jp
Received on 2022-02-18 07:45:37

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:33 CET