Re: Gadget crashes at IC readin

From: Sylvia Ploeckinger <ploeckinger_at_strw.leidenuniv.nl>
Date: Thu, 26 Nov 2015 14:05:30 +0100

Hi Volker,

    thanks for your reply! I set LONGIDS in the code and now it is
crashing a little bit earlier.

Instead of (without LONDIDs):

task=0 blocknr=0 bytes_per_blockelement=12 npart=288018 task=0 blocknr=2
bytes_per_blockelement=12 npart=288018 task=0 blocknr=3
bytes_per_blockelement=4 npart=288018 task=0 blocknr=4
bytes_per_blockelement=4 npart=0 task=0 blocknr=6
bytes_per_blockelement=4 npart=6445 task=0 blocknr=7
bytes_per_blockelement=4 npart=6445 I/O error (fread) on task=0 has
occured: end of file nread = 0, nmemb = 1 fread = 0 size = 4 task 0:
endrun called with an error level of 778

I now get (with LONGIDs):

task=0 blocknr=0 bytes_per_blockelement=12 npart=288018
task=0 blocknr=2 bytes_per_blockelement=12 npart=288018
task=0 blocknr=3 bytes_per_blockelement=8 npart=288018
I/O error (fread) on task=0 has occured: end of file
nread = 140788, nmemb = 262773
fread = 0
size = 8
task 0: endrun called with an error level of 778

Please find attached the complete output file for additional information.

I try to set up an isolated galaxy for some tests with new cooling
tables I am working on.
To do that, I got the IC generator and the parameter file that a
colleague of mine was
using for his isolated galaxies. In his paper he says:

"The initial conditions that we use are based on the model of Springel
et al. (2005),
and were generated using a modified version of a code that was kindly
provided to us
by Volker Springel."

I guess that the general structure of the ICs should be alright.
Is there a way to check whether the problem is somewhere in creating the
ICs or with the read-in?

Some more information:
- compiler version: gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
- compiler options for the ICs: gcc -m64 -O3
- compiler options for the Code:
CC = mpicc
ifeq ($(SYSTYPE), "paracluster")
OPTIMIZE = -g -O3 -Wall -DH5_USE_16_API
[...]
endif

mpicc -show:
gcc -m64 -O2 -fPIC -Wl,-z,noexecstack -I/usr/include/mpich-x86_64
-L/usr/lib64/mpich/lib -Wl,-rpath -Wl,/usr/lib64/mpich/lib -lmpich -lopa
-lmpl -lrt -lpthread

(I also tried to set all compiler optimization to -O2 but it didn't
change anything.)


Sylvia



On 26/11/15 11:17, Volker Springel wrote:
>
> Hi Sylvia,
>
> Presumably you have an IC file with 64-bit IDs but use gadget with 32-bit IDs. This could be resolved by activating LONGIDS in the code. Alternatively you recreate the ICs with 32-bit IDs.
>
> Alternatively the IC file has an incorrect structure, but it’s hard to diagnose this without information where this file comes from.
>
> Volker
>
>
>
> On Nov 26, 2015, at 10:42 AM, Sylvia Ploeckinger <ploeckinger_at_strw.leidenuniv.nl> wrote:
>
>> Hi Michael,
>>
>> thanks a lot for your input!
>> The compiler on my system should use 64bit as default,
>> but to be sure I included the -m64 flag for both the IC generator
>> as well as the code itself. Unfortunately it still doesn't work.
>>
>> Do you remember which flags you used to get it to work again after this error occurred?
>> Was it also -m64 or did you use something else?
>>
>> Greets,
>>
>> Sylvia
>>
>>
>> On 25/11/15 19:13, Michael Hansen wrote:
>>> HI Sylvia.
>>>
>>> I had the same problem some while ago. Check that you are compiling both code and IC generator with or without 64 bit compatibility.
>>>
>>> Cheers,
>>> Michael
>>>
>>> Sent from my iPhone
>>>
>>> On 25 Nov 2015, at 17:32, Sylvia Ploeckinger <ploeckinger_at_strw.leidenuniv.nl> wrote:
>>>
>>>> Hi all,
>>>>
>>>> as the subject line already gives away, Gadget crashes when I try to read in an input file.
>>>>
>>>> The error is:
>>>> I/O error (fread) on task=0 has occured: end of file
>>>> task 0: endrun called with an error level of 778
>>>>
>>>> I found a posting in this mailing list from 2006(!) with the same error:
>>>> http://wwwmpa.mpa-garching.mpg.de/gadget/gadget-list/0098.html
>>>>
>>>> but the solution seems to include some magic:
>>>> "So, the good news is that Gadget2 runs with the LCDM gas simulation, but I don't have a clue why it didn't before!"
>>>> (from the above posting)
>>>>
>>>> Did anyone else in the last 9 years have the same problem and knows how to fix it?
>>>> I included the following extra lines in the routine my_fread to get some extra information:
>>>> printf("nread = %zd, nmemb = %zd\n", nread, nmemb);
>>>> printf("fread = %zd\n", fread(ptr, size, nmemb, stream));
>>>> printf("size = %zd\n", size);
>>>>
>>>> and in the file read_ic.c, I included this line (as suggested by Volker in the 2006 discussion)
>>>>
>>>> printf("task=%d blocknr=%d bytes_per_blockelement=%d npart=%d\n",
>>>> ThisTask, blocknr, bytes_per_blockelement, npart);
>>>> fflush(stdout);
>>>>
>>>>
>>>> Please find below the corresponding output. It comes from a one core test run, but it crashes with the same error message when I run it on 4 cores.
>>>> The IC file seems to be okay, but is there an easy way to find out if the file is damaged to exclude this possibility?
>>>>
>>>> Thanks in advance!
>>>>
>>>> Regards,
>>>>
>>>> Sylvia
>>>>
>>>>
>>>> Allocated 129 MByte for FFT kernel(s).
>>>>
>>>>
>>>> Allocated 84.4627 MByte for particle storage.
>>>>
>>>> Allocated 1.47507 MByte for storage of SPH data.
>>>>
>>>>
>>>> reading file `/net/para35/data2/ploeckinger/IC_Gadget/output/m1e11f30.ics' on task=0 (contains 288018 particles.)
>>>> distributing this file to tasks 0-0
>>>> Type 0 (gas): 6445 (tot= 0000006445) masstab=5.28564e-06
>>>> Type 1 (halo): 262773 (tot= 0000262773) masstab=2.6309e-05
>>>> Type 2 (disk): 15040 (tot= 0000015040) masstab=5.28505e-06
>>>> Type 3 (bulge): 3760 (tot= 0000003760) masstab=5.28505e-06
>>>> Type 4 (stars): 0 (tot= 0000000000) masstab=0
>>>> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>>>>
>>>> task=0 blocknr=0 bytes_per_blockelement=12 npart=288018
>>>> task=0 blocknr=2 bytes_per_blockelement=12 npart=288018
>>>> task=0 blocknr=3 bytes_per_blockelement=4 npart=288018
>>>> task=0 blocknr=4 bytes_per_blockelement=4 npart=0
>>>> task=0 blocknr=6 bytes_per_blockelement=4 npart=6445
>>>> task=0 blocknr=7 bytes_per_blockelement=4 npart=6445
>>>> I/O error (fread) on task=0 has occured: end of file
>>>> nread = 0, nmemb = 1
>>>> fread = 0
>>>> size = 4
>>>> task 0: endrun called with an error level of 778
>>>>
>>>>
>>>> application called MPI_Abort(MPI_COMM_WORLD, 778) - process 0
>>>>
>>>> ===================================================================================
>>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>> = PID 16587 RUNNING AT para37.strw.leidenuniv.nl
>>>> = EXIT CODE: 10
>>>> = CLEANING UP REMAINING PROCESSES
>>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>>> ===================================================================================
>>>>
>>>> --
>>>> Sylvia Ploeckinger
>>>> Sterrewacht Leiden
>>>> The Netherlands
>>>>
>>>> Room Nr. 438
>>>> email:
>>>> ploeckinger_at_strw.leidenuniv.nl
>>>>
>>>> -----------------------------------------------------------
>>>>
>>>> If you wish to unsubscribe from this mailing, send mail to
>>>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
>>>> A web-archive of this mailing list is available here:
>>>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>> --
>> Sylvia Ploeckinger
>> Sterrewacht Leiden
>> The Netherlands
>>
>> Room Nr. 438
>> email:
>> ploeckinger_at_strw.leidenuniv.nl
>>
>> -----------------------------------------------------------
>>
>> If you wish to unsubscribe from this mailing, send mail to
>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
>> A web-archive of this mailing list is available here:
>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list

-- 
Sylvia Ploeckinger
Sterrewacht Leiden
The Netherlands
Room Nr. 438
email: ploeckinger_at_strw.leidenuniv.nl


Received on 2015-11-26 14:05:35

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:32 CET