work-load imbalance (again, but for SPH calculation)
Dear all,
I'm working on the simulations of galaxy-galaxy interactions (Nbody+SPH).
Recently, I notice a very strong work-load imbalance due to SPH
calculations when I increase the number CPUs.
For example,
(I define:
T_sph = the time for SPH calculation
T_tree = the time for tree-force calculation)
Using the same IC, on a 4-core computer, T_sph is comparable to
T_tree, say a factor of ~1, but when I run the same IC with more cores,
like 32 CPUs, on a cluster, I observed T_sph is about 4 time longer than
T_tree. The amount of exceeded time is indicated in the cpu.txt as
the work-load imbalance.
I feel the imbalance is linked to the domain decomposition. Because
during the merger, the gas component may be decoupled from baryon
strongly, while the domain decomposition is a global process ignoring
the type of particles (I don't know if I'm correct), Thus when N_CPU
increases, more regions are created. Some CPUs may have very large
number of gas particles, some may have few gas particles.
One thing I should tell you is that at the early stage of the simulation,
the work-load looks nice, the imbalance become significant after
the interaction begins.
Am I right? If I'm, is that possible to improve that? Any comments are
welcome.
Thanks in advance.
Yanbin.
Received on 2010-06-30 10:30:26
This archive was generated by hypermail 2.3.0
: 2023-01-10 10:01:31 CET