Re: LCDM_gas simulation aborts in initialization

From: Volker Springel <volker_at_MPA-Garching.MPG.DE>
Date: Wed, 07 Jun 2006 12:01:50 +0200

ROBERT.J.MORGAN_at_asu.edu wrote:
> Dear Volker,
>
> Thanks for the code. Installing it confirmed that EOF condition occurred. Also,
> checked IC file for lcdm_gas and it was 1966376 bytes long. Downloaded new
> Gadget-2.0.3 code and extracted files. Did "diff" on that lcdm_gas IC file
> and "old" (Gadget-2.0) IC file and were identical. (No diferences.) Just to be
> sure, also tried runing simulation with "new" IC file with same result as
> before. Complete text of output follows. Also, same result on single or dual
> processors. (Two compute nodes connected by LAN.) Also checked "param" file to
> make sure were the same. (Aside from path changes needed since I was running in
> separate new sub-directory.) I think call to my_fread() for CommBuffer "file"
> is part of loop checking on information for particle types. Although it has
> already found the 65,536 particles in the IC file, the code seems to expect
> more info or data in CommBuffer. Do you have any suggestions for what else I
> could try or check to get this simulation to run or find out why it doesn't ?
>

Hi Bob,

This is quite odd, and a problem that I cannot reproduce on any machine
or architecture I'm presently using. Only if you define the macro
LONGIDS (but you apparently haven't), I would expect this problem. In
read_ic.c, you can add the statements

  printf("task=%d blocknr=%d bytes_per_blockelement=%d npart=%d\n",
         ThisTask, blocknr, bytes_per_blockelement, npart);
  fflush(stdout);

after the line that reads

  npart = get_particles_in_block(blocknr, &typelist[0]);

This will produce additional diagnostic output, which should look like this:


Type 0 (gas): 32768 (tot= 0000032768) masstab=4.23508
Type 1 (halo): 32768 (tot= 0000032768) masstab=27.528
Type 2 (disk): 0 (tot= 0000000000) masstab=0
Type 3 (bulge): 0 (tot= 0000000000) masstab=0
Type 4 (stars): 0 (tot= 0000000000) masstab=0
Type 5 (bndry): 0 (tot= 0000000000) masstab=0

task=0 blocknr=0 bytes_per_blockelement=12 npart=65536
task=0 blocknr=1 bytes_per_blockelement=12 npart=65536
task=0 blocknr=2 bytes_per_blockelement=4 npart=65536
task=0 blocknr=3 bytes_per_blockelement=4 npart=0
task=0 blocknr=4 bytes_per_blockelement=4 npart=32768
reading done.
Total number of particles : 0000065536

This will at least tell you in which "block" of the IC file the problem
occurs.

Volker






> Thank you,
> Bob Morgan
> ASU
>
>
> Output of Gadget2, lcdm_gas simulation run :
>
>
> This is Gadget, version `2.0'.
>
> Running on 2 processors.
>
> found 5 times in output-list.
>
> Allocated 30 MByte communication buffer per processor.
>
> Communication buffer has room for 714938 particles in gravity computation
> Communication buffer has room for 245760 particles in density computation
> Communication buffer has room for 196608 particles in hydro computation
> Communication buffer has room for 182890 particles in domain decomposition
>
>
> Hubble (internal units) = 0.1
> G (internal units) = 43007.1
> UnitMass_in_g = 1.989e+43
> UnitTime_in_s = 3.08568e+16
> UnitVelocity_in_cm_per_s = 100000
> UnitDensity_in_cgs = 6.76991e-22
> UnitEnergy_in_cgs = 1.989e+53
>
> Task=0 FFT-Slabs=64
> Task=1 FFT-Slabs=64
>
> Allocated 3.99994 MByte for particle storage. 80
>
> Allocated 2.09997 MByte for storage of SPH data. 84
>
>
> reading file `../ICs/lcdm_gas_littleendian.dat' on task=0 (contains 65536
> particles.)
> distributing this file to tasks 0-1
> Type 0 (gas): 32768 (tot= 0000032768) masstab=4.23508
> Type 1 (halo): 32768 (tot= 0000032768) masstab=27.528
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
> I/O error (fread) on task=0 has occurred: end of file
> task 0: endrun called with an error level of 778
>
>
> [0] MPI Abort by user Aborting program !
> [0] Aborting program!
> p0_25749: p4_error: : 778
> Killed by signal 2.
> p0_25749: (4.391097) net_send: could not write to fd=4, errno = 32
> ++ exitstatus=1
> ++ '[' 0 '!=' 1 ']'
> ++ '[' 0 = 1 ']'
> ++ '[' 0 = 1 -a no = yes ']'
> ++ rm /home/bob/Gadget-2.0/COSMO/PI25665
> ++ '[' '' = yes ']'
> ++ '[' '' '!=' no -a '' = shared ']'
> ++ exit 1
>
> . end of output ..
>
>
> Quoting Volker Springel <volker_at_MPA-Garching.MPG.DE>:
>
>> ROBERT.J.MORGAN_at_asu.edu wrote:
>>> Am trying to run the lcdm_gas simualtion. Runs aborts in
>> initialization with
>>> messages:
>>> "reading file '../ICs/lcdm_gas_littleendian.dat on task=0 (contains
>> 65536
>>> particles.)
>>> distributing this file to tasks 0-0
>>> ... (then follows listing of types 0-5 particles) ...
>>>
>>> I/O error (fread) on task=0 has occured: no such file or directory
>>> task 0: endrun called with an error level of 778
>>>
>>> [0] MPI Abort by user Aborting program !
>>> [0] Aborting program!
>>> p0_11987: p4_error: :778 "
>>>
>>>
>>> Using gdb, problem seems to occur when read_file() in read_ic.c issues
>>> my_fread(CommBuffer, ...) at line 485 and my_fread() in io.c issues
>> and fread()
>>> and gets nread=1 instead of nmemb=32768 and treats this as I/O error
>> and calls
>>> endrun.
>> fread doen't distinguish between EOF and other errors when reading. From
>> the
>> viewpoint of gadget, both are equally bad and the run needs to be
>> terminated. To see whether you are dealing with an end-of-file or
>> another
>> error you can replace the line
>>
>> printf("I/O error (fread) on task=%d has occured: end of file\n",
>>
>> ThisTask);
>>
>> with
>> if(feof(stream))
>> printf("I/O error (fread) on task=%d has occured: end of
>> file\n",
>> ThisTask);
>> else
>> printf("I/O error (fread) on task=%d has occured: %s\n",
>> ThisTask,
>> strerror(errno));
>>
>> I think you will likely see an end-of-file error now, i.e. your IC file
>> is
>> corrupt somehow. (It should have a length of 1966376.)
>>
>>> This seems questionable since an EOF can result in a short read
>> (unless nmemb
>>> data had previously been written to file.)
>>>
>>> Am running on a single processor (for this run) on an AMD Athlon and
>> Centos 4.1
>>> (Linux) OS. Not using mpirun though mpich is configured and loaded.
>> Using
>>> lcdm_gas_littleendian.dat and lcdm_gas.param files (supplied
>> w/download.)
>>> Have previously run the collisionless galaxy simulations on both
>> single and
>>> multiple processors. But that doesn't use SPH which uses the
>> CommBuffer.
>>> Should I just skip my_fread() and use fread() in initialization code
>> for
>>> CommBuffer? Info fread doesn't seem to distinguish between IO error
>> and EOF
>>> conditions, so is there some way to tell if actual I/O error?
>> Yes, see above.
>>
>> Replacing my_fread with read will make the code ignore the I/O error. I
>>
>> wouldn't recommend that since then some of the initial data is undefined
>>
>> with unpredictable outcome.
>>
>> Volker
>>
>>
>>
>>> Thanks,
>>>
>>> Bob Morgan
>>>
>>> Arizona State University
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----------------------------------------------------------
>>>
>>> If you wish to unsubscribe from this mailing, send mail to
>>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
>> gadget-list
>>> A web-archive of this mailing list is available here:
>>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>>
>>
>>
>> -----------------------------------------------------------
>>
>> If you wish to unsubscribe from this mailing, send mail to
>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
>> gadget-list
>> A web-archive of this mailing list is available here:
>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>>
>
>
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
Received on 2006-06-07 12:01:45

This archive was generated by hypermail 2.3.0 : 2022-09-01 14:03:41 CEST