Re: trouble starting a large N-body run

From: Volker Springel <volker_at_MPA-Garching.MPG.DE>
Date: Mon, 24 Mar 2014 16:23:22 +0100

On Mar 24, 2014, at 11:00 AM, Robert Thompson wrote:

> Thanks Manodeep! That seems to resolve the 2LPTic issues. For N-GenIC the solution seems to be just a modification of the header and the same line in save.c that you mentioned. With either IC code I can now produce large initial condition files that seem to be properly reading into Gadget. However, I am now running into an odd issue when gadget goes to write the restart files:
>
> I/O error (fwrite) on task=192 has occured: Success
> task 192: endrun called with an error level of 777
>
> which is output for every task attempting to write the file. I have run other smaller runs on this machine without hassle so I have a feeling this has something to do with the large particle count. Has anyone run into this issue before or come across a solution? Thanks!
>

Hi Robert,

hmmm, might be that either the OS version or the particular filesystem you use does not allow you to write files larger than 2GB, or - more likely - that you cannot write data-sets larger than 2 GB with one call of fwrite. (Are your restart files that big?) In the latter case, you could modify the wrapper my_fwrite in io.c such that writes that are larger than 2Gb are split up into several smaller calls of fwrite().

Volker




> -Robert
>
>
> On Mar 18, 2014, at 6:47 PM, Manodeep Sinha <manodeep.sinha_at_Vanderbilt.Edu> wrote:
>
>>
>> On 3/18/14 11:38 AM, Robert Thompson wrote:
>>> Hi Volker thanks for your quick reply! I should note that the ICs were generated via N-GenIC and I am running the simulation with Gadget3.
>>>
>>>> It looks like your initial conditions file contains incorrect entries for the particle count. Note that 2250^3 > 2^32, i.e. your total particle count does not fit into an ordinary 32-bit unsigned int. In gadget2, the higher-order word is stored in a separate field in the file header (npartTotalHighWord[]).
>>>>
>>>> Check out the calculation of "All.TotNumPart" as well as of that of "All.MaxPart" in read_ic.c. For some reason you are getting All.MaxPart = 0, likely due to an incorrect value of the computed value of All.TotNumPart, which in turn probably originates in a faulty IC file header.
>>> I had a sneaking suspicion of this. It seems neither N-GenIC nor 2LPTic contains npartTotalHighWord, apparently the values are stored in npartTotal[1] & npartTotal[2], which interestingly enough are 0 in my IC header (probably the source of the problem). In N-GenIC I commented out NO64BITID (and enabled LONGIDS in gadget), are there any other tricks to getting it to create such large ICs?
>>>
>>>
>>>> Note: 128000 cores is pretty over the top for this particle count. I doubt that Gadget2 (which is nearly 10 years old) will work well for such a large number of MPI ranks - never tried it myself.
>>> I felt that was far too many cores myself; I figured even if I did get it to run the MPI overhead would slow it to a crawl.
>>>
>>> -Robert
>>>
>>>
>> Hi Robert,
>>
>> I have run into a similar issue in the past -- the public version of 2LPTic assigns the "overflow" particles into npart[2]. Line 115 in save.c reads as:
>>
>> header.npartTotal[2] = (TotNumPart >> 32);
>>
>> You need to change this to:
>>
>> header.npartTotalHighWord[1] = (TotNumPart >> 32);
>>
>> You will also need to get the updated header definition from a working copy of Gadget2. Otherwise, the HighWord field is not defined.
>>
>> In addition, since in your case Nmesh^3 exceeds UINT_MAX (2^32-1 for a 64 bit system), you will also need to modify main.c line 101 and declare nmesh3 as a double, and change the corresponding calculation for nmesh3 in line 558 and the (float) cast during the division by nmesh3 on line 625.
>>
>> Presumably, the changes for N-Genic will be at similar places - so I hope this helps.
>>
>> Cheers,
>> Manodeep
>>
>>>
>>> -----------------------------------------------------------
>>>
>>> If you wish to unsubscribe from this mailing, send mail to
>>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
>>> A web-archive of this mailing list is available here:
>>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>>>
>>
>>
>>
>>
>> -----------------------------------------------------------
>> If you wish to unsubscribe from this mailing, send mail to
>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
>> A web-archive of this mailing list is available here:
>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
Received on 2014-03-24 16:22:49

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:32 CET