Re: Question about shared-memory-handler from Volker Springel on 2020-11-16 (GADGET General Discussion Mailing List)

From: Volker Springel <vspringel_at_MPA-Garching.MPG.DE>
Date: Mon, 16 Nov 2020 10:47:17 +0100

Hi Leonard,

> On 15. Nov 2020, at 20:13, Leonard Romano <leonard.romano_at_tum.de> wrote:
>
> Hello Volker,
>
> Thank you for your very useful advice!
> I have one more unanswered question though, that I am sure you will know the answer to.
> Namely as you say in 2) I will be sending a request to the dedicated listener node, which shall then react to the request.
> In order to do that successfully, the send request needs to be provided with the address of that listener.
>
> MPI_Send(const void *buf, int count, MPI_Datatype datatype, int dest
> , int tag, MPI_Comm comm)
>
> My guess would be that for "comm" I will be using MPI_COMM_WORLD, but what is the address of the listener "dest", that I will need to store in the "foreign_point_data"? So far my guess was that this might be the attribute "MyShmRankInGlobal" of the shared memory handler dealing with the fetch request, but I don't understand the initial setup of the communication infrastructure enough to be certain, and it might just as well be something like "World_ThisTask".
>

Ok, couple of explanations on this point. For the communication infrastructur, the code splits off all the listener ranks from the global communicator MPI_COMM_WORLD, and you're then left with "Communicator" as your communicator, which contains only the MPI ranks that do all the communications. For most of the code, this is really your new "world", and all ordinary communication is restricted to this world (and Gadget4 will typically use 'ThisTask' to index the MPI-rank of the current rank and 'NTask' for the number of ranks in this world of compute ranks - the size of this communicator is also given by Shmem.Sim_NTask).

If and only if you reach out to one of the listener ranks (who follow a different execution path and are looping in the function shared_memory_handler() forever), you need to use MPI_COMM_WORLD.

But right, you still need to have the right index rank of the listener you want to talk to. Suppose you want to get hold of a rank designated by "origintask" (where 0 <= origintask < Shmem.Sim_NTask), which is one of our MPI-compute ranks. Then you can get the listener rank within MPI_COMM_WORLD which has shared-memory access to this rank (and thus lies on the same compute node) through

Shmem.GetGhostRankForSimulCommRank[origintask]

In other words, the table GetGhostRankForSimulCommRank translates from a compute rank in the simulation communicator to the responsible associated listener rank in the MPI_COMM_WORLD communicator.

Hope this clarifies it,
Volker

> Thank you very much!
>
> Regards,
> Leonard
>
>
> On 15.11.20 13:18, Volker Springel wrote:
>> Hi Leonardo,
>>
>> If you want to asynchronously modify particle data on another MPI rank, you need to distinguish between two situations:
>>
>> 1) If the other MPI rank is on the same shared memory node, you can access the corresponding data directly (via shared memory) and modify it.
>>
>> 2) If the other MPI rank is on a different node, you need to send a request to the dedicated listener on the other node, which can then access and modify the particle data on the target rank via shared memory access. To make this feasible, you need to implement a corresponding handler routine in the function shmem::shared_memory_handler() in file src/mpi_utils/shared_mem_handler.cc (this is the code which the designated listener executes all the time). In the normal tree routines, it is mostly the function tree_fetch_foreign_nodes() that reads remote data via these handler routines.
>>
>> Further comments:
>>
>> -- For both to work, you will need to make sure that the current base-address of all the particle data is known to all MPI ranks on a local shared memory node. This is normally done by calling the function prepare_shared_memory_access() before an access period, and then to release this again with cleanup_shared_memory_access() when the access period is over.
>>
>> -- It would not be good to try to avoid (1) in favour of using (2) always (i.e. to let the ghost rank also deal with other MPI ranks that are local), because then the code would not work any more if executed on a single node (because in the latter case, no listener ranks are created at all).
>>
>> -- The change of particle data belonging to a different process can in general easily create race conditions. If needed, this therefore needs to be protected by a spin lock on the particle's data to serialize all changes acting on the same particle. Look at the function simparticles::drift_particle() in src/time_integration/predict.cc for an example of how a spin-lock on a particle can be set and released.
>>
>> Hope this helps at least a bit... Especially if you are new to MPI, the changes you are attempting are rather non-trivial.
>>
>> Regards,
>> Volker
>>
>>
>>
>>> On 13. Nov 2020, at 10:55, Leonard Romano <leonard.romano_at_tum.de>
>>> wrote:
>>>
>>>
>>> Dear Gagdet-community,
>>>
>>> I am working with Gadget-4 and am trying to develop a function that sends a request
>>> from a local computing rank to a foreign memory rank telling it to update a value that's stored
>>> in the instance of particle_data on the foreign one.
>>> I am using a data-structure which is analogous to the neighbor tree for SPH particles, which is equipped with a structure
>>> similar to "foreign_sphpoint_data".
>>> I figured that the easiest way to do is would be to simply provide it with all the access data I need to call "get_Pp" in addition to the
>>> shared memory rank of the processor node it resides on.
>>> The problem is, I am not sure where that last bit of information can be found.
>>> Is it by any chance simply "shmem::MyShmRankInGlobal"? In that case I could send my request to the destination "MyShmRankInGlobal" for the communicator "MPI_COMM_WORLD" or am I mistaken?
>>> I am new to MPI and have not fully understood how all of this works, so if someone could help, I would be very grateful!
>>>
>>> All the best,
>>> Leonard Romano
>>>
>>>
>>>
>>>
>>> -----------------------------------------------------------
>>> If you wish to unsubscribe from this mailing, send mail to
>>>
>>> minimalist_at_MPA-Garching.MPG.de
>>> with a subject of: unsubscribe gadget-list
>>> A web-archive of this mailing list is available here:
>>>
>>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>>
>>
>>
>> -----------------------------------------------------------
>>
>> If you wish to unsubscribe from this mailing, send mail to
>>
>> minimalist_at_MPA-Garching.MPG.de
>> with a subject of: unsubscribe gadget-list
>> A web-archive of this mailing list is available here:
>>
>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
Received on 2020-11-16 10:47:18