Re: LCDM_gas simulation aborts in initialization

From: Yves Revaz <yves.revaz_at_obspm.fr>
Date: Mon, 19 Jun 2006 11:56:07 +0200

Dear Gadget list,

I'm replying to the comment I made concerning a strange behavior
between 32 and 64 bits computers (diffenrence of the size of the header).

This problem of bit alignement may be solved when using gcc, simply by
adding
the option -fpack-struct

Regards.



Yves Revaz wrote:

>
> Dear Robert and Volker,
>
> This remember me a strange behavior between 32 and 64 bit computers.
> The same initial condition file was unable to run on the 64bit while it
> was running on the 32bit without any problem.
>
> In fact, the size of the structure "io_header" (sizeof(io_header)) was
> not the same
> when compiled with gcc on a 32 or on a 64 bit. !
> To pach this, I simply modified the length of the "char fill[60];" in
> allvars.h,
> depending on the computer, in order to ensure that
> "sizeof(io_header)=256".
>
> So, maybe check that all block you read have the right size.
>
> I hope it will help... (not sure...)
>
>
>
>
>
> ROBERT.J.MORGAN_at_asu.edu wrote:
>
>> Dear Volker,
>>
>> Thanks for the code. Installing it confirmed that EOF condition
>> occurred. Also, checked IC file for lcdm_gas and it was 1966376 bytes
>> long. Downloaded new Gadget-2.0.3 code and extracted files. Did
>> "diff" on that lcdm_gas IC file and "old" (Gadget-2.0) IC file and
>> were identical. (No diferences.) Just to be sure, also tried runing
>> simulation with "new" IC file with same result as before. Complete
>> text of output follows. Also, same result on single or dual
>> processors. (Two compute nodes connected by LAN.) Also checked
>> "param" file to make sure were the same. (Aside from path changes
>> needed since I was running in separate new sub-directory.) I think
>> call to my_fread() for CommBuffer "file" is part of loop checking on
>> information for particle types. Although it has already found the
>> 65,536 particles in the IC file, the code seems to expect more info
>> or data in CommBuffer. Do you have any suggestions for what else I
>> could try or check to get this simulation to run or find out why it
>> doesn't ?
>>
>> Thank you,
>> Bob Morgan
>> ASU
>>
>> Output of Gadget2, lcdm_gas simulation run :
>>
>>
>> This is Gadget, version `2.0'.
>>
>> Running on 2 processors.
>>
>> found 5 times in output-list.
>>
>> Allocated 30 MByte communication buffer per processor.
>>
>> Communication buffer has room for 714938 particles in gravity
>> computation
>> Communication buffer has room for 245760 particles in density
>> computation
>> Communication buffer has room for 196608 particles in hydro computation
>> Communication buffer has room for 182890 particles in domain
>> decomposition
>>
>>
>> Hubble (internal units) = 0.1
>> G (internal units) = 43007.1
>> UnitMass_in_g = 1.989e+43
>> UnitTime_in_s = 3.08568e+16
>> UnitVelocity_in_cm_per_s = 100000
>> UnitDensity_in_cgs = 6.76991e-22
>> UnitEnergy_in_cgs = 1.989e+53
>>
>> Task=0 FFT-Slabs=64
>> Task=1 FFT-Slabs=64
>>
>> Allocated 3.99994 MByte for particle storage. 80
>>
>> Allocated 2.09997 MByte for storage of SPH data. 84
>>
>>
>> reading file `../ICs/lcdm_gas_littleendian.dat' on task=0 (contains
>> 65536 particles.)
>> distributing this file to tasks 0-1
>> Type 0 (gas): 32768 (tot= 0000032768) masstab=4.23508
>> Type 1 (halo): 32768 (tot= 0000032768) masstab=27.528
>> Type 2 (disk): 0 (tot= 0000000000) masstab=0
>> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
>> Type 4 (stars): 0 (tot= 0000000000) masstab=0
>> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>>
>> I/O error (fread) on task=0 has occurred: end of file
>> task 0: endrun called with an error level of 778
>>
>>
>> [0] MPI Abort by user Aborting program !
>> [0] Aborting program!
>> p0_25749: p4_error: : 778
>> Killed by signal 2.
>> p0_25749: (4.391097) net_send: could not write to fd=4, errno = 32
>> ++ exitstatus=1
>> ++ '[' 0 '!=' 1 ']'
>> ++ '[' 0 = 1 ']'
>> ++ '[' 0 = 1 -a no = yes ']'
>> ++ rm /home/bob/Gadget-2.0/COSMO/PI25665
>> ++ '[' '' = yes ']'
>> ++ '[' '' '!=' no -a '' = shared ']'
>> ++ exit 1
>>
>> . end of output ..
>>
>>
>> Quoting Volker Springel <volker_at_MPA-Garching.MPG.DE>:
>>
>>
>>
>>> ROBERT.J.MORGAN_at_asu.edu wrote:
>>>
>>>
>>>> Am trying to run the lcdm_gas simualtion. Runs aborts in
>>>>
>>>
>>> initialization with
>>>
>>>> messages:
>>>> "reading file '../ICs/lcdm_gas_littleendian.dat on task=0 (contains
>>>>
>>>
>>> 65536
>>>
>>>> particles.)
>>>> distributing this file to tasks 0-0
>>>> ... (then follows listing of types 0-5 particles) ...
>>>>
>>>> I/O error (fread) on task=0 has occured: no such file or directory
>>>> task 0: endrun called with an error level of 778
>>>>
>>>> [0] MPI Abort by user Aborting program !
>>>> [0] Aborting program!
>>>> p0_11987: p4_error: :778 "
>>>>
>>>>
>>>> Using gdb, problem seems to occur when read_file() in read_ic.c issues
>>>>
>>>> my_fread(CommBuffer, ...) at line 485 and my_fread() in io.c issues
>>>>
>>>
>>> and fread()
>>>
>>>> and gets nread=1 instead of nmemb=32768 and treats this as I/O error
>>>>
>>>
>>> and calls
>>>
>>>> endrun.
>>>>
>>>
>>> fread doen't distinguish between EOF and other errors when reading.
>>> From
>>> the viewpoint of gadget, both are equally bad and the run needs to
>>> be terminated. To see whether you are dealing with an end-of-file or
>>> another error you can replace the line
>>>
>>> printf("I/O error (fread) on task=%d has occured: end of file\n",
>>>
>>> ThisTask);
>>>
>>> with
>>> if(feof(stream))
>>> printf("I/O error (fread) on task=%d has occured: end of
>>> file\n", ThisTask);
>>> else
>>> printf("I/O error (fread) on task=%d has occured: %s\n",
>>> ThisTask, strerror(errno));
>>>
>>> I think you will likely see an end-of-file error now, i.e. your IC file
>>> is corrupt somehow. (It should have a length of 1966376.)
>>>
>>>
>>>
>>>> This seems questionable since an EOF can result in a short read
>>>>
>>>
>>> (unless nmemb
>>>
>>>> data had previously been written to file.)
>>>>
>>>> Am running on a single processor (for this run) on an AMD Athlon and
>>>>
>>>
>>> Centos 4.1
>>>
>>>> (Linux) OS. Not using mpirun though mpich is configured and loaded.
>>>>
>>>
>>> Using
>>>
>>>> lcdm_gas_littleendian.dat and lcdm_gas.param files (supplied
>>>>
>>>
>>> w/download.)
>>>
>>>> Have previously run the collisionless galaxy simulations on both
>>>>
>>>
>>> single and
>>>
>>>> multiple processors. But that doesn't use SPH which uses the
>>>>
>>>
>>> CommBuffer.
>>>
>>>
>>>> Should I just skip my_fread() and use fread() in initialization code
>>>>
>>>
>>> for
>>>
>>>> CommBuffer? Info fread doesn't seem to distinguish between IO error
>>>>
>>>
>>> and EOF
>>>
>>>> conditions, so is there some way to tell if actual I/O error?
>>>>
>>>
>>> Yes, see above.
>>>
>>> Replacing my_fread with read will make the code ignore the I/O error. I
>>>
>>> wouldn't recommend that since then some of the initial data is
>>> undefined
>>>
>>> with unpredictable outcome.
>>>
>>> Volker
>>>
>>>
>>>
>>>
>>>
>>>> Thanks,
>>>>
>>>> Bob Morgan
>>>>
>>>> Arizona State University
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -----------------------------------------------------------
>>>> If you wish to unsubscribe from this mailing, send mail to
>>>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
>>>>
>>>
>>> gadget-list
>>>
>>>
>>>> A web-archive of this mailing list is available here:
>>>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>>>>
>>>
>>>
>>>
>>> -----------------------------------------------------------
>>> If you wish to unsubscribe from this mailing, send mail to
>>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
>>> gadget-list
>>> A web-archive of this mailing list is available here:
>>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>>>
>>>
>>
>>
>>
>>
>>
>> -----------------------------------------------------------
>> If you wish to unsubscribe from this mailing, send mail to
>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
>> gadget-list
>> A web-archive of this mailing list is available here:
>> http://www.mpa-garching.mpg.de/gadget/gadget-list
>>
>>
>
>


-- 
                                                (o o)
--------------------------------------------oOO--(_)--OOo-------
  Yves Revaz
  Lerma Batiment A           Tel : ++ 33 (0) 1 40 51 20 79
  Observatoire de Paris      Fax : ++ 33 (0) 1 40 51 20 02 
  77 av Denfert-Rochereau    e-mail : yves.revaz_at_obspm.fr
  F-75014 Paris              Web : http://obswww.unige.ch/~revaz/
  FRANCE             
----------------------------------------------------------------
Received on 2006-06-19 11:54:26

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:30 CET