Hi everyone, just adding in an update for this.
Working through some of the simulations with @HIMADRI_SAINI, it looks like we have a good idea of how this crash is occurring. The short version is that payu’s calendar calculations for the cice model are currently a bit opaque and convoluted, which can cause cice to be out of sync with the other model components in specific situations when a restart directory is used across different configurations.
For anyone who is interested I’ll run through some details below, but will note that this is on the agenda now as something to improve in a future updates of payu, and so I’m hoping the following details will soon become outdated.
Currently payu sets the cice model’s start date by reading an initialisation date from the init_date
parameter in the <control-directory>/input_ice.nml
namelist file. Payu then adds a “number of seconds previously simulated” to this initialisation date , calculated from the runtime0
and runtime
parameters in the identically named <restart-directory>/input_ice.nml
file. The resulting date is then used payu’s start date.
For example in the current pre-industrial configuration, <control-directory>/input_ice.nml
has init_date = 00010101
(YYYYMMDD), and <restart-directory>/input_ice.nml
has runtime0=3155673600
(seconds) and runtime=0
. Adding these all together gives a start date of 01010101
.
The ocean and atmosphere read their start dates from text files in the restart directory if they exist, and otherwise fall back to settings in the control directory namelists, but don’t combine information between the two.
Where payu’s mixing of information in the restart and control directories can go wrong is when a single restart is used across two different experiments that happen to have different init_date
settings in their ice control directories. The calculated start dates will differ, and cice will be out of sync with the other components in one of the experiments.
One way this can happen is when using the available warm-start.sh
scripts to branch off from a CSIRO simulation. These control the ice start date by adjusting the init_date
parameter to the desired start date in the control directory, and setting the runtime0
and runtime
parameters to 0
in the restart directory. If the resulting restart directory is copied over to another experiment which doesn’t contain the corresponding init_date
change, then payu will calculate the wrong start date for cice.
The above behaviour is quite confusing and intuitively I think it would make sense for a restart directory to contain all the required timing information. As noted, there’s currently a discussion here about cleaning up these calculations in payu to make them more transparent and to prevent these crashes. Feel free to bring up any ideas or suggestions!