Re: forcing domain decomposition after deleting particles

From: Mark Baumann <markb_at_physics.utexas.edu>
Date: Mon, 27 Jun 2011 18:59:30 -0500 (CDT)

Hello again,

I believe I may have found the reason why domain decomposition is failing
to occur for all processes. The process that is deleting particles is
crashing before domain decomposition is finished.

In particular, using a debugger I find I am experiencing a seg fault
inside domain.c. This occurs either in domain_Decomposition() on the line

MPI_Allgather(&N_gas, 1, MPI_INT, list_N_gas, 1, MPI_INT, MPI_COMM_WORLD);

or in domain_decompose() on the line

temp = malloc(NTask * 6 * sizeof(int));

This occurs only for the processor that deleted one or more particles.
Interestingly, the problem doesn't occur for small numbers of particles.
It only happens once the particle count is 100,000 or so. It happens
regardless of how many processors I am using.

This appears to me to be either an MPI issue or a memory issue. Running
with valgrind has turned up no offensive memory leaks.

I am using gcc 4.4.1. I have tried mvapich v1.0.1 and mvapich2 v1.2
(using an InfiniBand machine) and mpich2 (on a dual-core machine) and all
have the same result.

Below I will include my code for deleting particles.

If anyone has any ideas why I am having this problem I would be very
grateful!

Mark

code for deleting particles which is called immediately before
domain_Decomposition:

void sink_particle(void)
{

   int i, j, SP, numcoll;
   int root[1], source[1];
   FLOAT distsqrd;
   FLOAT SPpos[3];
   struct particle_data * Ptemp;
   struct sph_particle_data * SPHtemp;
   int * temp;

   numcoll = 0;
   totpart[0] = 0;
   totgas[0] = 0;

   /* First find the star particle, which is the sink particle */
   /* For now, assume there will be just one sink particle */

   SP = -1;
   i = 0;
   while (i < NumPart && SP < 0)
   {
     if (P[i].Type == 4)
       SP = i;
     i++;
   }

   /* Broadcast sink particle information to other processors and receive
sink particle information from other processors */

   if (SP >= 0) {
         for(i=0; i<3; i++) SPpos[i] = P[SP].Pos[i];
         root[0] = ThisTask;
   }
   else root[0] = 0;

   /* First get the ID of the process which contains the sink particle */
   MPI_Allreduce(root, source, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);

   /* Now send (if we are the source) or receive the position of the sink
particle */

   MPI_Bcast(SPpos, 3*sizeof(FLOAT), MPI_BYTE, source[0], MPI_COMM_WORLD);

   /* Loop through all local gas particles and check for collisions with
sink particle */
   /* Flag particles to be deleted by setting particle type to -1 */

   for(i=0; i<N_gas; i++)
   {
     if (P[i].Type == 0)
     {
         distsqrd = (SPpos[0]-P[i].Pos[0])*(SPpos[0]-P[i].Pos[0]) +
                    (SPpos[1]-P[i].Pos[1])*(SPpos[1]-P[i].Pos[1]) +
                    (SPpos[2]-P[i].Pos[2])*(SPpos[2]-P[i].Pos[2]);
         if (distsqrd < SINKRAD * SINKRAD)
         {
             P[i].Type = -1;
             numcoll++;
         }
     }
   }

   /* If we had any collisions, delete the flagged particles */

   if (numcoll > 0)
   {
         Ptemp = malloc(All.MaxPart*sizeof(struct particle_data));
         SPHtemp = malloc(All.MaxPartSph*sizeof(struct particle_data));

         for(i=0, j=0; i<NumPart; i++)
         {
             if (P[i].Type != -1) /* this particle was not accreted */
             {
                 Ptemp[j] = P[i];
                 if (j < N_gas-numcoll) SPHtemp[j] = SphP[i];
                 j++;
             }
         }

         free(P);
         P = Ptemp;
         free(SphP);
         SphP = SPHtemp;

         /* update the particle numbers on this processor */
         NumPart -= numcoll;
         N_gas -= numcoll;

         /* force a domain decomposition */
         All.NumForcesSinceLastDomainDecomp = 1 + All.TreeDomainUpdateFrequency * All.TotNumPart;

   }

   /* synchronize total particle number across all processors
   * This must be done on every proc every time because even if a collision didn't occur
   * on this proc, the total particle numbers must be synchronized with other processors
   * that may have had collisions */
   /*
   * Since All.TotNumPart and All.TotN_gas are of type long long, use Allgather
   * instead of Allreduce
   */

   temp = malloc(NTask * sizeof(int));
   All.TotNumPart = 0;
   All.TotN_gas = 0;

   MPI_Allgather(&NumPart, 1, MPI_INT, temp, 1, MPI_INT, MPI_COMM_WORLD);
   for(i = 0; i< NTask; i++) {
      All.TotNumPart += temp[i];
   }

   for(i=0; i<NTask; i++) temp[i] = 0;
   MPI_Allgather(&N_gas, 1, MPI_INT, temp, 1, MPI_INT, MPI_COMM_WORLD);

   for(i = 0; i< NTask; i++) {
      All.TotN_gas += temp[i];
   }
   free(temp);

}



On Fri, 24 Jun 2011, Mark Baumann wrote:

>
> Hello,
>
> I am deleting SPH particles when they collide with a sink particle. After
> doing so I force a domain decomposition using:
>
> All.NumForcesSinceLastDomainDecomp = 1 + All.TreeDomainUpdateFrequency *
> All.TotNumPart;
>
> This line only updates the value of NumForcesSinceLastDomainDecomp on the
> local processor, but I've noticed that this usually triggers domain
> decomposition on all processes since they will sometimes exchange particles.
> Therefore, is it sufficient to just set the local value, or should I export
> the new value to all processes?
>
> Currently I am experiencing an unexpected crash on one of the processes
> during the "tree force" step. I've noticed that domain decomposition is only
> occurring for the process on which the particle was deleted and I am
> wondering if that could be part of the reason for the crash.
>
> I have included the following code in the routine gravity_tree() but
> otherwise I have not modified the tree force or domain decomposition code.
>
> for(i=0, NumForceUpdate = 0; i<NumPart; i++)
> {
> if (P[i].Ti_endstep == All.Ti_Current)
> #ifdef SELECTIVE_NO_GRAVITY
> if (!((1 << P[i].Type) & (SELECTIVE_NO_GRAVITY)))
> #endif
> NumForceUpdate++;
> }
>
> I am updating the local lists of particles (P and SphP) as well as the values
> of NumPart and N_gas whenever a particle is deleted. After that I
> synchronize the new particle numbers on the rest of the processors using
> Allgather, which appears to be properly updating the value of All.TotNumPart
> (and All.TotN_gas) an all processes.
>
> Another question I have is, when should I modify the value of
> NumForcesSinceLastDomainDecomp? Presumably after modifying the local
> particle counts, but does it matter whether I do it before or after
> synchronizing All.TotNumPart (and All.TotN_gas)?
>
> Thank you for any feedback!
>
> Mark
>
>
>
>
> -----------------------------------------------------------
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
>
>
Received on 2011-06-28 01:59:36

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:31 CET