Re: Hangup when generating initial conditions

From: Ken Osato <ken.osato_at_yukawa.kyoto-u.ac.jp>
Date: Thu, 10 Feb 2022 23:03:05 +0900

Dear Volker,

Thank you for your prompt help.
Following your advice, I reran the simulation with options
MPI_HYPERCUBE_ALLGATHERV, MPI_HYPERCUBE_ALLTOALL,
MPI_MESSAGE_SIZELIMIT_IN_MB = 100, and ALLOCATE_SHARED_MEMORY_VIA_POSIX.
(I also switched on PRESERVE_SHMEM_BINARY_INVARIANCE and DEBUG options.)

For this run, the Ewald table computation does not raise any errors and
memory allocation takes less than a second.
> MALLOC: Allocation of shared memory took 0.002775 sec

However, the code halted in initial condition. The stdout log file reads
> EWALD: reading Ewald tables from file
> `ewald_table_1-1-1_64-64-64_precision8-order3.dat'
> EWALD: Initialization of periodic boundaries finished.
> NGENIC: generated grid of size 2048
> NGENIC: computing displacement fields...
> NGENIC: vel_prefac1= 3557.04  hubble_a=28456.5   fom1=0.999995
> NGENIC: vel_prefac2= 7114.08  hubble_a=28456.5   fom2=1.99999
> NGENIC: Dplus=50.1933
and MPI error is output in the stderr log file.
> Abort(471456271) on node 226 (rank 226 in comm 0): Fatal error in
> PMPI_Sendrecv: Other MPI error, error stack:
> PMPI_Sendrecv(249)...............: MPI_Sendrecv(sbuf=0x14a1b9c480e8,
> scount=8, MPI_BYTE, dest=1493, stag=1303, rbuf=0x14a1b9c4eee8,
> rcount=8, MPI_BYTE, src=1493, rtag=1303, comm=0x84000004, status=0x1)
> failed
> MPID_Isend(830)..................:
> MPIDI_isend_unsafe(334)..........:
> MPIDI_OFI_inject_handler_vci(671): OFI tagged inject failed
> (ofi_impl.h:671:MPIDI_OFI_inject_handler_vci:No route to host)
There seems to be something wrong with MPI communication (probably in
2LPT calculation part).
I'd appreciate your help. Thank you.

Best regards,
Ken


On 2022/02/09 0:49, Volker Springel wrote:
> Dear Ken,
>
> Sorry, yes, the MPI_HYPERCUBE_ALLGATHERV created a problem because in the Ewald table module I had at some point introduced the MPI_IN_PLACE option in the call of MPI_Allgatherv(), but my own version of MPI_Allgatherv() wasn't setup for MPI_IN_PLACE yet.
>
> I have changed this now, i.e. if you update the code, MPI_HYPERCUBE_ALLGATHERV should not create a hang any more. While at it, I have also cleaned up how the native MPI_Allgatherv() is wrapped with a custom version in the code, and likewise I introduce such an option for MPI_Alltoall() as well, through the new MPI_HYPERCUBE_ALLTOALL option. Also, I enforced the message size limit in MPI_Sendrecv more consistently in all relevant places.
>
> For maximum MPI-stability, one should then set MPI_HYPERCUBE_ALLGATHERV, MPI_HYPERCUBE_ALLTOALL, and impose a MPI_MESSAGE_SIZELIMIT_IN_MB = 100 or lower. But one should not activate USE_MPIALLTOALLV_IN_DOMAINDECOMP and ISEND_IRECV_IN_DOMAIN.
>
> I hope this helps. I note that in your original log-message, I also noted the suspicious line
>
> MALLOC: Allocation of shared memory took 187.053 sec
>
> There is really no reason why the allocation of the shared memory should take that long (it should take fractions of a second)... I have experienced this phenomenon with some MPICH-based MPI-libraries in combination with older linux kernels. I have no clue why this happens, except that it shouldn't.
>
> As a work-around for this, you can try to activate the new option ALLOCATE_SHARED_MEMORY_VIA_POSIX.
>
> Best,
> Volker
>
>
>> On 7. Feb 2022, at 13:03, Ken Osato <ken.osato_at_yukawa.kyoto-u.ac.jp> wrote:
>>
>>
>> Dear Volker,
>>
>> Thank you for your help. I tried running with "MPI_HYPERCUBE_ALLGATHERV" but this time the run failed in Ewald table module. I've attached the error log for this run below.
>> I also tried switching on "USE_MPIALLTOALLV_IN_DOMAINDECOMP" or "ISEND_IRECV_IN_DOMAIN" but for both runs, the code failed due to similar errors.
>> In all the runs above, I reduced the MPI size limit, i.e., MPI_MESSAGE_SIZELIMIT_IN_MB = 100, to avoid large communications.
>>
>> Best regards,
>> Ken
>>
>>> ==== backtrace (tid: 459742) ====
>>> 0 0x0000000000012b20 .annobin_sigaction.c() sigaction.c:0
>>> 1 0x000000000085d219 I_MPI_memcpy_movsb() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/i_mpi_memcpy_sse.h:11
>>> 2 0x000000000085d219 bdw_memcpy_write() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_memcpy.h:162
>>> 3 0x000000000085c554 write_to_frame() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_memcpy.h:478
>>> 4 0x000000000085c554 send_frame() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_send.h:1212
>>> 5 0x0000000000853833 MPIDI_POSIX_eager_send() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/intel_transport_send.h:1543
>>> 6 0x0000000000755399 MPIDI_POSIX_eager_send() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/posix/eager/include/posix_eager_impl.h:37
>>> 7 0x0000000000755399 MPIDI_POSIX_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/../posix/posix_am.h:220
>>> 8 0x0000000000755399 MPIDI_SHM_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/shm_am.h:49
>>> 9 0x0000000000755399 MPIDIG_isend_impl() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:116
>>> 10 0x000000000075870e MPIDIG_am_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:172
>>> 11 0x000000000075870e MPIDIG_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/generic/mpidig_send.h:233
>>> 12 0x000000000075870e MPIDI_POSIX_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/../posix/posix_send.h:59
>>> 13 0x000000000075870e MPIDI_SHM_mpi_isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/shm_p2p.h:187
>>> 14 0x000000000075870e MPIDI_isend_unsafe() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:314
>>> 15 0x000000000075870e MPIDI_isend_safe() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:609
>>> 16 0x000000000075870e MPID_Isend() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_send.h:828
>>> 17 0x000000000075870e PMPI_Sendrecv() /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpi/pt2pt/sendrecv.c:181
>>> 18 0x000000000044fa28 MPI_hypercube_Allgatherv() /home/uchu/ken.osato/Gadget-4/src/mpi_utils/hypercube_allgatherv.cc:47
>>> 19 0x000000000047d12f ewald::ewald_init() /home/uchu/ken.osato/Gadget-4/src/gravity/ewald.cc:208
>>> 20 0x0000000000405347 sim::begrun1() /home/uchu/ken.osato/Gadget-4/src/main/begrun.cc:222
>>> 21 0x000000000040f0f3 main() /home/uchu/ken.osato/Gadget-4/src/main/main.cc:220
>>> 22 0x0000000000023493 __libc_start_main() ???:0
>>> 23 0x0000000000404cae _start() ???:0
>>
>> On 06/02/2022 19:22, Volker Springel wrote:
>>> Dear Keto,
>>>
>>> This looks very much like another MPI instability, I'm afraid, in a call of the native version of MPI_Allgatherv() of Intel MPI. This is one of the most general and complex collective communication calls... my experience is that many MPI libraries are not always stable for it (depending on the size of the transfer, the network stack, the phase of the moon, etc.), presumably due to their aggressive attempts to optimize execution time.
>>>
>>> This is also why there is the switch
>>>
>>> MPI_HYPERCUBE_ALLGATHERV
>>>
>>> in the code, which will replace the native MPI_Allgatherv() call with my own simple hypercupe algorithm based on MPI_Sendrecv(). I would suggest to switch this on and try again.
>>>
>>> Best regards,
>>> Volker
>>>
>>> ps: The hang you experienced should not be affected by any change since Dec 23, so it is probably not reproducable in detail, which would be again consistent with a flaky implementation of MPI_Allgatherv() in the library.
>>>
>>>> On 27. Jan 2022, at 13:38, Ken Osato <ken.osato_at_yukawa.kyoto-u.ac.jp> wrote:
>>>>
>>>> Dear Gadget users,
>>>>
>>>> Actually, I had a similar problem raised by Julianne, related to the routine shared_memory_handler(), when running gravity-only simulations with Gadget-4.
>>>> The error seems to occur for MPICH-based libraries since I'm also using Intel MPI (v. 2020.4.304) on our cluster.
>>>>
>>>> Volker has already fixed this issue and I've run the simulation in order to test the code in my environment.
>>>> First, I've run the simulation with 1024^3 particles and the run is successful without errors.
>>>> However, when I increase the number of particles to 2048^3, it hangs up in generating initial conditions.
>>>> This error occurs for both of analytic calculations (PowerSpectrumType=1) and loading table (PowerSpectrumType=2).
>>>> I attach the log files for this run.
>>>> When I ran the simulation with older version of Gadget-4 (Git commit b4bb065ce3dec478d2a2d7101cefc5f5faade084, Wed Dec 23 17:05:02 2020 +0100), there was no error for initial conditions.
>>>> I think the current error again might be related to the different implementation between OpenMPI and MPICH.
>>>>
>>>> There is also a quite minor error about finalization. I always find the following error message every time the job ends.
>>>>> Abort(806969615) on node 185 (rank 185 in comm 0): Fatal error in PMPI_Finalize: Other MPI error, error stack:
>>>>> PMPI_Finalize(214)...............: MPI_Finalize failed
>>>>> PMPI_Finalize(159)...............:
>>>>> MPID_Finalize(1288)..............:
>>>>> MPIDI_OFI_mpi_finalize_hook(1892): OFI domain close failed (ofi_init.c:1892:MPIDI_OFI_mpi_finalize_hook:Device or resource busy)
>>>> But it seems that the job successfully finishes, the log file (stdout) ends with "endrun called, calling MPI_Finalize() bye!".
>>>> And I found no errors in output snapshots. Probably, it might be also due to MPI libraries.
>>>>
>>>> Best regards,
>>>> Ken
>>>>
>>>> --
>>>> Ken Osato
>>>> Yukawa Institute for Theoretical Physics, Kyoto University
>>>> Kitashirakawa Oiwakecho, Sakyo-ku, Kyoto 606-8502, Japan
>>>> Tel: +81-75-753-7000
>>>> E-mail: ken.osato_at_yukawa.kyoto-u.ac.jp
>>>> <slurm-151171.out><log.txt>
>>>> -----------------------------------------------------------
>>>>
>>>> If you wish to unsubscribe from this mailing, send mail to
>>>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
>>>> A web-archive of this mailing list is available here:
>>>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>>>
>>>
>>> -----------------------------------------------------------
>>>
>>> If you wish to unsubscribe from this mailing, send mail to
>>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
>>> A web-archive of this mailing list is available here:
>>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>> --
>> Ken Osato
>> Yukawa Institute for Theoretical Physics, Kyoto University
>> Kitashirakawa Oiwakecho, Sakyo-ku, Kyoto 606-8502, Japan
>> Tel: +81-75-753-7000
>> E-mail: ken.osato_at_yukawa.kyoto-u.ac.jp
>>
>>
>>
>>
>> -----------------------------------------------------------
>> If you wish to unsubscribe from this mailing, send mail to
>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
>> A web-archive of this mailing list is available here:
>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list

-- 
Ken Osato
Yukawa Institute for Theoretical Physics, Kyoto University
Kitashirakawa Oiwakecho, Sakyo-ku, Kyoto 606-8502, Japan
Tel: +81-75-753-7000
E-mail: ken.osato_at_yukawa.kyoto-u.ac.jp
Received on 2022-02-10 15:03:27

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:33 CET