Ice restart problem with payu/1.1.6

Hi,
I am having 3 simulations simultaneously fail due to payu errors. All three runs are failing at the payu setup stage, where I get an error that says:

FileNotFoundError: CICE restart file not found in /scratch/y99/dkh157/access-esm/archive/esm_b21_c1/restart853/ice. Expected iced.08550101 to exist. Is 'dumpfreq' set in cice_in.nml consistently with the run-length?

And the same for the other two runs. These simulations were working fine up until Christmas using payu/1.1.5.
I’m wondering if some changes have been made to how payu handles the ice restart files in the latest payu/1.1.6?
My run dir is:

/home/157/dkh157/ACCESS/esm_b21_c1

if anyone wants to check it out. please help
Regards,
David

UPDATE: If I revert back to payu/1.1.5 the simulations continue to work, so I’m reverting to that for now to get the jobs running over the weekend.

Hi David

We put some additional checks into payu to stop a case where CICE can continually use the same restart file, rather than progressing with the model.

Does iced.08550101 exist in the
/scratch/y99/dkh157/access-esm/archive/esm_b21_c1/restart853/ice folder ?

What is the run length and what is dumpfreq set to in cice_in.nml ?

Hi Anton,

No the file it’s looking for: iced.08550101 doesn’t exist.
I have instead the following iced files in my last restart folder:

iced.09531201
iced.09540101
iced.09540201
iced.09540301
iced.09540401
iced.09540501
iced.09540601
iced.09540701
iced.09540801
iced.09540901
iced.09541001
iced.09541101

Somehow the expected years are not matching up… I don’t know why.

I have dumpfreq = 'm'
And the run length is 1 year as per normal ACCESS-ESM1.5 configurations.

Since I set the run going again with the older payu, I copied the error state to a new run directory to test in payu/1.1.6. It is here:
/home/157/dkh157/ACCESS/test_c1

Ah right - this rings bells. Deferring to @spencerwong who will know the answer!

1 Like

Hi @dkhutch, I’ve taken a look at the restart directories with @anton and I think we have an idea of what’s happened.

CICE’s calendar can get a bit complicated in ESM1.5 due to the multiple sources of information it uses. The binary iced.YYYYMMDD contains a time used to set CICE’s internal date, in turn controlling when it writes restart and history files. Meanwhile, Payu uses the dates specified in the restart_date.nml file when calculating the run duration.

The changes in payu/1.1.6 check that the time in the binary iced file matches the dates in the restart_date.nml file, as it’s often a sign that something has gone wrong when they don’t match. This check is raising an error in your simulation as the times in the iced restart files look like they are ~100 years ahead of the times in the restart_date.nml file and also the atmosphere and ocean components.

Taking a look at the the restart directories from earlier in the experiment, it looks like this offset may have been present from the beginning of the experiment. This causes the ice history files to have different dates to the atmosphere and ocean, but the offset will also cause problems in the leap years that occur on years 400, 800, etc:

When the atmosphere, ocean, and restart_date.nml file are at year 300, the iced file is at year 400, and there’s a discrepancy between some calculations that think it’s a leap year, and others that don’t. I.e. payu will set the run duration as 365 days, while CICE’s internal calendar will think it’s a 366 day year. Due to this discrepancy, CICE’s internal date will be 30/12/400 at the end of the 365 days of simulation, and it won’t write a history or restart file – the last restart file produced will have been `iced.04001201’.

As payu/1.1.5 didn’t have any safeguards around this, it then picked up the December restart for the next simulation while the other components were in January. The same issue occured at year 700/800 where another ice restart would be skipped, and would happen again every 400 years.

To continue the simulation, I think the best option might be to modify the time in the latest iced restart file so that it matches the restart_date.nml file and the other components. The cicedumpdatemodify.py script from the warm start scripts looks like it should be able to do this. I’ll test this out and get back to you with more information.

3 Likes

Note that this will mean two Decembers are missing from the cice model run. e.g. it jumps straight from November to January. I would guess the sea ice would equalise by the following winter, but be careful if this is important. This is just a guess, it might take longer to correct itself.

Also the timestamps on cice output won’t match the rest of the experiment … they will be offset by 100 years at the start, then 99 years 11 months (after 300 model years) and 99 years 10 months (after 700 model years)

2 Likes

Hi @anton and @spencerwong
Thanks so much for looking into this. I agree that I should reset the ice restart date to be in sync with the other components. And I will use the dump date modification script that you mentioned. Will report back with the outcome.
Regards,
David

1 Like

Hi @dkhutch, I’ve looked a bit more into the cicedumpdatemodify.py script and the usage example here. From what I understand, the usage would be:

scripts/cicedumpdatemodify.py -i <input_file> -o  <output_file> --istep0=<istep_0> --time=<seconds>. --time_forc=0.

<seconds> should be the seconds between the experiment initialisation (init_date in restart_date.nml) and the restart date (assuming a proleptic Gregorian calendar). E.g. to set a restart date of 08550101, while init_date=10101, you would use:

--time=26949628800.

<istep0> refers to the number of timesteps simulated since the initialisation date. I suspect that the value supplied probably isn’t too crucial, but you can set it to a value consisitent with the time setting by dividing the CICE timestep of 3600 seconds, e.g:

--istep0=7486008

<time_forc> doesn’t seem to be used, and I think it’s safe to leave this one at 0.

The final step would be to edit the restart pointer ice.restart_file to point to the new restart file.

Cheers,
Spencer

1 Like

Assigning to you Spencer to clear from triage, hope that’s ok, cheers