Re: LCDM_gas simulation aborts in initialization

From: <ROBERT.J.MORGAN_at_asu.edu>
Date: Sat, 01 Jul 2006 00:06:46 -0700 (MST)

Dear Volker, et al,

Very Strange! I inserted the diagnostic code in read_ic.c and the simulation
ran fine ! Now, I inserted it in the 2.03 version, since, for some reason, the
2.0 version of the source was missing from my system. I then compiled Gadget2
from the unmodified 2.03 version (thinking there might be some changes that
affected this) and also from the 2.0 source (from a CD copy of 2.0) and both
versions ran fine! I cannot account for this. I do not think that there are any
library or compiler changes between the original problem or now but will check.
I believe the last changes were with the original compile that had the problem
when I had to recompile the fftw code with mpich. Also, I do not believe there
were any system changes, anyway the original executable Gadget2 still has this
error problem (I keep earlier versions around under different names.) So, the
good news is that Gadget2 runs with the LCDM gas simulation, but I don't have a
clue why it didn't before! I will let you know if I find out why but my main
priority now is to do the simlations and create my own IC files. Oh, BTW, if
you know of any programs to create IC files (I believe there was a reference to
such in one of the gadget-list mailings) I would appreciate it.) Thank you for
help.

Regards,

Bob Morgan
   



Quoting Volker Springel <volker_at_MPA-Garching.MPG.DE>:

>
> ROBERT.J.MORGAN_at_asu.edu wrote:
> > Dear Volker,
> >
> > Thanks for the code. Installing it confirmed that EOF condition
> occurred. Also,
> > checked IC file for lcdm_gas and it was 1966376 bytes long. Downloaded
> new
> > Gadget-2.0.3 code and extracted files. Did "diff" on that lcdm_gas IC
> file
> > and "old" (Gadget-2.0) IC file and were identical. (No diferences.)
> Just to be
> > sure, also tried runing simulation with "new" IC file with same result
> as
> > before. Complete text of output follows. Also, same result on single
> or dual
> > processors. (Two compute nodes connected by LAN.) Also checked "param"
> file to
> > make sure were the same. (Aside from path changes needed since I was
> running in
> > separate new sub-directory.) I think call to my_fread() for CommBuffer
> "file"
> > is part of loop checking on information for particle types. Although
> it has
> > already found the 65,536 particles in the IC file, the code seems to
> expect
> > more info or data in CommBuffer. Do you have any suggestions for what
> else I
> > could try or check to get this simulation to run or find out why it
> doesn't ?
> >
>
> Hi Bob,
>
> This is quite odd, and a problem that I cannot reproduce on any
> machine
> or architecture I'm presently using. Only if you define the macro
> LONGIDS (but you apparently haven't), I would expect this problem. In
> read_ic.c, you can add the statements
>
> printf("task=%d blocknr=%d bytes_per_blockelement=%d npart=%d\n",
> ThisTask, blocknr, bytes_per_blockelement, npart);
> fflush(stdout);
>
> after the line that reads
>
> npart = get_particles_in_block(blocknr, &typelist[0]);
>
> This will produce additional diagnostic output, which should look like
> this:
>
>
> Type 0 (gas): 32768 (tot= 0000032768) masstab=4.23508
> Type 1 (halo): 32768 (tot= 0000032768) masstab=27.528
> Type 2 (disk): 0 (tot= 0000000000) masstab=0
> Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> Type 4 (stars): 0 (tot= 0000000000) masstab=0
> Type 5 (bndry): 0 (tot= 0000000000) masstab=0
>
> task=0 blocknr=0 bytes_per_blockelement=12 npart=65536
> task=0 blocknr=1 bytes_per_blockelement=12 npart=65536
> task=0 blocknr=2 bytes_per_blockelement=4 npart=65536
> task=0 blocknr=3 bytes_per_blockelement=4 npart=0
> task=0 blocknr=4 bytes_per_blockelement=4 npart=32768
> reading done.
> Total number of particles : 0000065536
>
> This will at least tell you in which "block" of the IC file the
> problem
> occurs.
>
> Volker
>
>
>
>
>
>
> > Thank you,
> > Bob Morgan
> > ASU
> >
> >
> > Output of Gadget2, lcdm_gas simulation run :
> >
> >
> > This is Gadget, version `2.0'.
> >
> > Running on 2 processors.
> >
> > found 5 times in output-list.
> >
> > Allocated 30 MByte communication buffer per processor.
> >
> > Communication buffer has room for 714938 particles in gravity
> computation
> > Communication buffer has room for 245760 particles in density
> computation
> > Communication buffer has room for 196608 particles in hydro
> computation
> > Communication buffer has room for 182890 particles in domain
> decomposition
> >
> >
> > Hubble (internal units) = 0.1
> > G (internal units) = 43007.1
> > UnitMass_in_g = 1.989e+43
> > UnitTime_in_s = 3.08568e+16
> > UnitVelocity_in_cm_per_s = 100000
> > UnitDensity_in_cgs = 6.76991e-22
> > UnitEnergy_in_cgs = 1.989e+53
> >
> > Task=0 FFT-Slabs=64
> > Task=1 FFT-Slabs=64
> >
> > Allocated 3.99994 MByte for particle storage. 80
> >
> > Allocated 2.09997 MByte for storage of SPH data. 84
> >
> >
> > reading file `../ICs/lcdm_gas_littleendian.dat' on task=0 (contains
> 65536
> > particles.)
> > distributing this file to tasks 0-1
> > Type 0 (gas): 32768 (tot= 0000032768) masstab=4.23508
> > Type 1 (halo): 32768 (tot= 0000032768) masstab=27.528
> > Type 2 (disk): 0 (tot= 0000000000) masstab=0
> > Type 3 (bulge): 0 (tot= 0000000000) masstab=0
> > Type 4 (stars): 0 (tot= 0000000000) masstab=0
> > Type 5 (bndry): 0 (tot= 0000000000) masstab=0
> >
> > I/O error (fread) on task=0 has occurred: end of file
> > task 0: endrun called with an error level of 778
> >
> >
> > [0] MPI Abort by user Aborting program !
> > [0] Aborting program!
> > p0_25749: p4_error: : 778
> > Killed by signal 2.
> > p0_25749: (4.391097) net_send: could not write to fd=4, errno = 32
> > ++ exitstatus=1
> > ++ '[' 0 '!=' 1 ']'
> > ++ '[' 0 = 1 ']'
> > ++ '[' 0 = 1 -a no = yes ']'
> > ++ rm /home/bob/Gadget-2.0/COSMO/PI25665
> > ++ '[' '' = yes ']'
> > ++ '[' '' '!=' no -a '' = shared ']'
> > ++ exit 1
> >
> > . end of output ..
> >
> >
> > Quoting Volker Springel <volker_at_MPA-Garching.MPG.DE>:
> >
> >> ROBERT.J.MORGAN_at_asu.edu wrote:
> >>> Am trying to run the lcdm_gas simualtion. Runs aborts in
> >> initialization with
> >>> messages:
> >>> "reading file '../ICs/lcdm_gas_littleendian.dat on task=0
> (contains
> >> 65536
> >>> particles.)
> >>> distributing this file to tasks 0-0
> >>> ... (then follows listing of types 0-5 particles) ...
> >>>
> >>> I/O error (fread) on task=0 has occured: no such file or directory
> >>> task 0: endrun called with an error level of 778
> >>>
> >>> [0] MPI Abort by user Aborting program !
> >>> [0] Aborting program!
> >>> p0_11987: p4_error: :778 "
> >>>
> >>>
> >>> Using gdb, problem seems to occur when read_file() in read_ic.c
> issues
> >>> my_fread(CommBuffer, ...) at line 485 and my_fread() in io.c
> issues
> >> and fread()
> >>> and gets nread=1 instead of nmemb=32768 and treats this as I/O
> error
> >> and calls
> >>> endrun.
> >> fread doen't distinguish between EOF and other errors when reading.
> From
> >> the
> >> viewpoint of gadget, both are equally bad and the run needs to be
> >> terminated. To see whether you are dealing with an end-of-file or
> >> another
> >> error you can replace the line
> >>
> >> printf("I/O error (fread) on task=%d has occured: end of
> file\n",
> >>
> >> ThisTask);
> >>
> >> with
> >> if(feof(stream))
> >> printf("I/O error (fread) on task=%d has occured: end of
> >> file\n",
> >> ThisTask);
> >> else
> >> printf("I/O error (fread) on task=%d has occured: %s\n",
> >> ThisTask,
> >> strerror(errno));
> >>
> >> I think you will likely see an end-of-file error now, i.e. your IC
> file
> >> is
> >> corrupt somehow. (It should have a length of 1966376.)
> >>
> >>> This seems questionable since an EOF can result in a short read
> >> (unless nmemb
> >>> data had previously been written to file.)
> >>>
> >>> Am running on a single processor (for this run) on an AMD Athlon
> and
> >> Centos 4.1
> >>> (Linux) OS. Not using mpirun though mpich is configured and
> loaded.
> >> Using
> >>> lcdm_gas_littleendian.dat and lcdm_gas.param files (supplied
> >> w/download.)
> >>> Have previously run the collisionless galaxy simulations on both
> >> single and
> >>> multiple processors. But that doesn't use SPH which uses the
> >> CommBuffer.
> >>> Should I just skip my_fread() and use fread() in initialization
> code
> >> for
> >>> CommBuffer? Info fread doesn't seem to distinguish between IO
> error
> >> and EOF
> >>> conditions, so is there some way to tell if actual I/O error?
> >> Yes, see above.
> >>
> >> Replacing my_fread with read will make the code ignore the I/O error.
> I
> >>
> >> wouldn't recommend that since then some of the initial data is
> undefined
> >>
> >> with unpredictable outcome.
> >>
> >> Volker
> >>
> >>
> >>
> >>> Thanks,
> >>>
> >>> Bob Morgan
> >>>
> >>> Arizona State University
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> -----------------------------------------------------------
> >>>
> >>> If you wish to unsubscribe from this mailing, send mail to
> >>> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
> >> gadget-list
> >>> A web-archive of this mailing list is available here:
> >>> http://www.mpa-garching.mpg.de/gadget/gadget-list
> >>
> >>
> >>
> >> -----------------------------------------------------------
> >>
> >> If you wish to unsubscribe from this mailing, send mail to
> >> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
> >> gadget-list
> >> A web-archive of this mailing list is available here:
> >> http://www.mpa-garching.mpg.de/gadget/gadget-list
> >>
> >
> >
> >
> >
> > -----------------------------------------------------------
> >
> > If you wish to unsubscribe from this mailing, send mail to
> > minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
> gadget-list
> > A web-archive of this mailing list is available here:
> > http://www.mpa-garching.mpg.de/gadget/gadget-list
>
>
>
>
> -----------------------------------------------------------
>
> If you wish to unsubscribe from this mailing, send mail to
> minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe
> gadget-list
> A web-archive of this mailing list is available here:
> http://www.mpa-garching.mpg.de/gadget/gadget-list
>
Received on 2006-07-01 09:39:38

This archive was generated by hypermail 2.3.0 : 2023-01-10 10:01:30 CET