Re: trouble starting a large N-body run

From: Robert Thompson <rthompsonj_at_gmail.com>
Date: Mon, 24 Mar 2014 12:00:26 +0200

Thanks Manodeep! That seems to resolve the 2LPTic issues. For N-GenIC the solution seems to be just a modification of the header and the same line in save.c that you mentioned. With either IC code I can now produce large initial condition files that seem to be properly reading into Gadget. However, I am now running into an odd issue when gadget goes to write the restart files:

I/O error (fwrite) on task=192 has occured: Success
task 192: endrun called with an error level of 777

which is output for every task attempting to write the file. I have run other smaller runs on this machine without hassle so I have a feeling this has something to do with the large particle count. Has anyone run into this issue before or come across a solution? Thanks!

-Robert


On Mar 18, 2014, at 6:47 PM, Manodeep Sinha <manodeep.sinha_at_Vanderbilt.Edu> wrote:

>
> On 3/18/14 11:38 AM, Robert Thompson wrote:
>> Hi Volker thanks for your quick reply! I should note that the ICs were generated via N-GenIC and I am running the simulation with Gadget3.
>>
>>> It looks like your initial conditions file contains incorrect entries for the particle count. Note that 2250^3 > 2^32, i.e. your total particle count does not fit into an ordinary 32-bit unsigned int. In gadget2, the higher-order word is stored in a separate field in the file header (npartTotalHighWord[]).
>>>
>>> Check out the calculation of "All.TotNumPart" as well as of that of "All.MaxPart" in read_ic.c. For some reason you are getting All.MaxPart = 0, likely due to an incorrect value of the computed value of All.TotNumPart, which in turn probably originates in a faulty IC file header.
>> I had a sneaking suspicion of this. It seems neither N-GenIC nor 2LPTic contains npartTotalHighWord, apparently the values are stored in npartTotal[1] & npartTotal[2], which interestingly enough are 0 in my IC header (probably the source of the problem). In N-GenIC I commented out NO64BITID (and enabled LONGIDS in gadget), are there any other tricks to getting it to create such large ICs?
>>
>>
>>> Note: 128000 cores is pretty over the top for this particle count. I doubt that Gadget2 (which is nearly 10 years old) will work well for such a large number of MPI ranks - never tried it myself.
>> I felt that was far too many cores myself; I figured even if I did get it to run the MPI overhead would slow it to a crawl.
>>
>> -Robert
>>
>>
> Hi Robert,
>
> I have run into a similar issue in the past -- the public version of 2LPTic assigns the "overflow" particles into npart[2]. Line 115 in save.c reads as:
>
> header.npartTotal[2] = (TotNumPart >> 32);
>
> You need to change this to:
>
> header.npartTotalHighWord[1] = (TotNumPart >> 32);
>
> You will also need to get the updated header definition from a working copy of Gadget2. Otherwise, the HighWord field is not defined.
>
> In addition, since in your case Nmesh^3 exceeds UINT_MAX (2^32-1 for a 64 bit system), you will also need to modify main.c line 101 and declare nmesh3 as a double, and change the corresponding calculation for nmesh3 in line 558 and the (float) cast during the division by nmesh3 on line 625.
>
> Presumably, the changes for N-Genic will be at similar places - so I hope this helps.
>
> Cheers,
> Manodeep
>
>>
>> -----------------------------------------------------------
>>
>> If you wish to unsubscribe from this mailing, send mail to
>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
>> A web-archive of this mailing list is available here:
>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>>
>
>
>
>
> -----------------------------------------------------------
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
Received on 2014-03-24 11:00:32

This archive was generated by hypermail 2.3.0 : 2022-09-01 14:03:42 CEST