work-load imbalance (again, but for SPH calculation)

From: yanbin YANG <yanbin.yang_at_gmail.com>
Date: Wed, 30 Jun 2010 16:30:21 +0800

Dear all,

I'm working on the simulations of galaxy-galaxy interactions (Nbody+SPH).
Recently, I notice a very strong work-load imbalance due to SPH
calculations when I increase the number CPUs.

For example,
(I define:
T_sph = the time for SPH calculation
T_tree = the time for tree-force calculation)

Using the same IC, on a 4-core computer, T_sph is comparable to
T_tree, say a factor of ~1, but when I run the same IC with more cores,
like 32 CPUs, on a cluster, I observed T_sph is about 4 time longer than
T_tree. The amount of exceeded time is indicated in the cpu.txt as
the work-load imbalance.

I feel the imbalance is linked to the domain decomposition. Because
during the merger, the gas component may be decoupled from baryon
strongly, while the domain decomposition is a global process ignoring
the type of particles (I don't know if I'm correct), Thus when N_CPU
increases, more regions are created. Some CPUs may have very large
number of gas particles, some may have few gas particles.
One thing I should tell you is that at the early stage of the simulation,
the work-load looks nice, the imbalance become significant after
the interaction begins.


Am I right? If I'm, is that possible to improve that? Any comments are
welcome.

Thanks in advance.
Yanbin.
Received on 2010-06-30 10:30:26

This archive was generated by hypermail 2.3.0 : 2022-09-01 14:03:42 CEST