Hugo,
According to my experience with Gadget, I would proceed this way :
1) If you have modified some parts of the code, check that these parts
may not be at the origin of a NaN of Inf problem.
2) Check that the snapshot from which you re-start the simulation do not
contains Nan or Inf numbers, or isolated particles at very large
distance from the
center of mass or very large velocities.
3) The fact that the problem is related to the node 14 probably is
linked to the
fact that the problematic particles are treated by this node. However,
to test the hardware
of node 14, simply try to remove it from the list of the used nodes and
see if the
problem still occur.
I hope it will help.
Regards.
Hugo Vicente Capelato wrote:
>Dear Ives
>
>Thank you for your answer. The problem is that this error
>is coming intermittently and, seemingly, more and more frequently.
>
>Restarting Gadget from the point it stopped makes it goes somewhat
>further in time, but after while it stops again with the same error,
>but not for the same particle as, for instance, like this:
>
>=====================
>Begin Step 52398, Time: 0.190878, Systemstep: 4.76837e-06
>domain decomposition...
>NTopleaves= 2360
>work-load balance=1.3816 memory-balance=1.754
>exchange of 0000017839 particles
>domain decomposition done.
>begin Peano-Hilbert order...
>Peano-Hilbert done.
>Start force computation...
>
>Error: A timestep of size zero was assigned on the integer timeline!
>We better stop.
>Task=14 Part-ID=372839 dt=nan tibase=3.72529e-08 ti_step=-2147483648 ac=nan
>xyz=(-194.79|219.205|427.333) tree=(nan|nannan)
>
>Tree construction.
>Tree construction done.
>Begin tree force.
>tree is done.
>=======================
>
>However it is seems to be always the same "Task=14" which, I suppose,
>corresponds to certain node of the cluster. I am suspecting of a
>hardware problem... What do you think ?
>
>Cheers, Hugo
>
>Quoting Yves Revaz <yves.revaz_at_obs.unige.ch>:
>
>
>
>>Dear Hugo,
>>
>>This means that the timestep for particle 346505 computed by get_timestep()
>>is equal to zero or is not a number (nan), which should not occure.
>>In this case the program is of course stopped.
>>
>>In you case, the log also says "ac=nan" which is not very nice...
>>This is probably due to zero division or overflow during force computation.
>>
>>Regards.
>>
>>
>>
>>
>>
>>>Error: A timestep of size zero was assigned on the integer timeline!
>>>We better stop.
>>>Task=14 Part-ID=346505 dt=nan tibase=3.72529e-08 ti_step=-2147483648
>>>ac=nan xyz=
>>>(-199.269|226.042|432.335) tree=(nan|nannan)
>>>
>>>
>>>
>>>
>>>
>>--
>> (o o)
>>--------------------------------------------oOO--(_)--OOo-------
>> Yves Revaz
>> Lerma Batiment A Tel : ++ 33 (0) 1 40 51 20 79
>> Observatoire de Paris Fax : ++ 33 (0) 1 40 51 20 02
>> 77 av Denfert-Rochereau e-mail : yves.revaz_at_obspm.fr
>> F-75014 Paris Web : http://obswww.unige.ch/~revaz/
>> FRANCE
>>----------------------------------------------------------------
>>
>>
>>
>>
>>-----------------------------------------------------------
>>
>>If you wish to unsubscribe from this mailing, send mail to
>>minimalist_at_MPA-Garching.MPG.de with a subject of: unsubscribe gadget-list
>>A web-archive of this mailing list is available here:
>>http://www.mpa-garching.mpg.de/gadget/gadget-list
>>
>>
>>
>
>
>
>
Received on 2006-06-10 13:19:35