Hi Enrique,
I had a very similar problem with our cluster.
In my case, it was a problem with the InfiniBand communication. The
following mpiexec options solved the problem for me:
-mca mpi_leave_pinned 0 -mca btl_openib_eager_rdma 0
It's worth a try.
Cheers, Christian
On 09/07/2011 01:34 AM, Enrique Vazquez wrote:
> Hi,
>
> I'm submitting some test runs on my new 180-core cluster, but I'm
> encountering a very strange behavior: a run that I had performed in
> my previous cluster (118^3 SPH particles, no DM particles, on 8 CPUs),
> with identical parameters, and only the compile options being different
> to fit the new system, runs fine on 8 cores, but when I run it run on 32
> cores, it only advances up to a certain time, and then ceases to advance any
> further. It generally stops somewhere near the tree calculation, either
> while doing the domain decomposition, or while computing the tree force,
> or while computing the potential for all particles. Other runs I'm doing
> with larger SPH particle numbers (up to 27 million) on up to 128 cores
> act the same way.
>
> The strange part is that there's no crash. The system behaves as if it
> had entered an infinite loop, with the processors still crunching at full
> speed, but no further advance in simulation time occurs. No error messages
> are displayed, and the last timestep printed is perfectly normal (0.146).
> Because the problem occurs for 32 cores but not for 8, I guess it's some
> problem with MPI.
>
> I'm running Gadget2, with a few modifications by us to include extra ISM physics,
> on AMD 12-core processors, using OpenMPI and Intel compilers. The connectivity
> is Infiniband. Any suggestions or insight will be greatly appreciated!
>
> Best regards,
> Enrique
>
>
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
Received on 2011-09-07 02:35:17