trouble starting a large N-body run

From: Robert Thompson <rthompsonj_at_gmail.com>
Date: Tue, 18 Mar 2014 17:26:59 +0200

Hi everyone,

I am attempting to run a large N-body only run with 2250^3 particles (IC file is ~257Gb). Launching the job on anything less than 128,000 processors results in an error when reading in & distributing the first IC file. It errors out with code 173 and says:

Not enough space on task=36 (space for 0, need at least 90877)

The last number varies depending on the number of cores I chose where a higher core count results in a lower number in place of 90877. Once I reach 128,000 cores I finally begin to run into memory issues related the the individual nodes (endrun 18):

failed to allocate 62500 MB of memory. (presently allocated=4.39453 MB)

I am currently trying this on BlueWaters where each node has 32processors and 64GB of memory. I have tried various PMGRID options (=256-2048) all with similar results. The particle count seems roughly equivalent to the Millennium simulation which only used 512 processing cores according to the website. I have attempted to use fewer processors per node but regardless of the configuration I end up with an endrun 173 unless I am launching with ~128,000 cores. I have the PartAllocFactor set to its minimum recommended value of 1.05 as well. Any advice on how to reduce the required core count?

-Robert Thompson
Received on 2014-03-18 16:27:07

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:32 CET