Re: Error from function deal_with_sph_node_request

From: Goddard, Julianne <Julianne.Goddard_at_uky.edu>
Date: Fri, 22 Oct 2021 21:00:46 +0000

Hello Leonard,

Thank you for your reply, it is interesting that we are both experiencing the same problems. Yes, mine do seem random, there does not appear to be a pattern at all to the occurrence.

The one thing I will mention is that I have experienced almost the same error when I ran the same simulation with cooling turned off, however then it was function deal_with_gravity_node_request rather than the sph. This was before I implemented grackle into the code. Since implementing grackle I have had no issue running the simulation with cooling turned off (I don’t know why this should happen, again it seems random).

Sincerely,
Julianne

On Oct 22, 2021, at 4:41 PM, Leonard Romano <leonard.romano_at_tum.de> wrote:


CAUTION: External Sender


Hello Julianne,

I am also using Grackle for cooling and when I enable star formation, I encounter the same error. What bugged me the most is that it seems to happen at random, i.e. sometimes after few stars have spawned and sometimes only after hundreds or thousands have spawned.
Does your error occur at random too?
Unfortunately I did not have time to debug this problem yet, so if you or anyone has any ideas, it would be very welcome.
Though needless to say it seems very likely that these kinds of issues are related to our custom implementations of these sub grid physics (Grackle is not part of the public Gadget code), so most likely we will have to find our own solutions to the bugs in our own code...

Best,
Leonard


On 22.10.21 22:14, Goddard, Julianne wrote:
Hello Everyone,

I am running a zoom-in cosmological simulation with periodic boundary conditions in Gadget4. I am using grackle for cooling and star formation is enabled. The zoom region in the simulation is about 1.5 Mpc in radius, and the effective resolution here is 1024^3. I have found that the code runs to completion if I run on only one node, however if I increase to two or more nodes I start to get one of the following errors:

"Code termination on task=91, function deal_with_sph_node_request(), file src/mpi_utils/shared_mem_handler.cc, line 272: p=1564695652 MaxPart=5869 MaxNodes=13117"

or

"Fatal error in PMPI_Recv: Unknown error class, error stack:
PMPI_Recv(171)........................: MPI_Recv(buf=0x7f63546475c0, count=8, MPI_BYTE, src=31, tag=10, MPI_COMM_WORLD, status=0x1) failed
MPIDU_Complete_posted_with_error(1137): Process failed"

I have once had the code complete running in parallel without experiencing these errors, but since I have not been able to replicate. Has anyone else experienced this type of error or have advice on how to fix the problem?

Thank You,

Julianne



--
===================================================
Leonard Romano, B.Sc.(レオナルド・ロマノ)
Physics Department
Technical University of Munich (TUM), Germany
Theoretical Astrophysics Group
Department of Earth and Space Science
Graduate School of Science, Osaka University, Japan
he / him / his
===================================================

-----------------------------------------------------------

If you wish to unsubscribe from this mailing, send mail to
minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
A web-archive of this mailing list is available here:
http://www.mpa-garching.mpg.de/gadget/gadget-list
Received on 2021-10-22 23:01:02

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:33 CET