Tips, tricks, and troubleshooting for new ACCESS-ESM 1.5 user

Hi all, new ACCESS-ESM user here, so sorry for any silly questions.

I want to run a series of experiments branching from the pre-industrial control where I perturb/warm the Southern Ocean CDW layer. At the start, I’m going to try doing this by just modifying the ocean restart file.

I understand ACCESS NRI will be releasing a supported version soon, but to get started in the meantime, I’ve been following the Quickstart guidelines here. In doing so, I’ve run into a few problems, which thanks to @HIMADRI_SAINI and @dkhutch are now solved (:crossed_fingers:).

  1. You need to be a member of the access software project on Gadi to access the atmosphere files. Maybe it’s worth mentioning this in the quickstart guide(?), though perhaps this will be redundant with the ACCESS NRI supported version.

  2. I had trouble with the atmosphere restart file as mentioned in this forum post and I think relevant here too. This was solved by using the backup restart file in the same directory, but would it be better practice to use the restart here: /g/data/vk83/experiments/inputs/access-esm1p5/pre-industrial/restart/?

I have a few other queries too:

Changing the restart file
At the moment, the model is running and I’m happy to play around with tests using the existing restart (yr 101). However, I would like to change this when I run my actual experiments by starting from a later year, e.g. yr 700. Does anyone know where these restarts for the pre-industrial control are stored?

If I change the restart year, do I need to be aware of any other intricacies? @dkhutch mentioned I also need to change the datestamp for the atmosphere. Is there anything else I should be modifying?

Git branches vs new directories
I’m a bit confused as to best-practice workflow for running new experiments (at least until the ACCESS NRI version is released). Should I be cloning the Github repo to a new folder for each new experiment, or, if using payu from vk83, should I be using new git branches for each experiment? If I stuff something up with the latter method, could I accidentally overwrite data?

2 Likes

The intention of the new branching features, and automated experiment versioning, is that it should be harder to accidentally overwrite data.

With the way payu currently works if you accidentally make an experiment control directory with the same name as one that already exists it will find that laboratory in /scratch and use it. Typically this doesn’t result in data loss, but can add to an existing run rather than start a new experiment. This can be a pain to unravel. It can lead to data loss if payu sweep --clean is used, which will delete the whole laboratory directory containing outputs, restarts and logs.

To ensure the new branching approach works best make sure to use payu clone and payu branch rather than git branch. payu wraps the native git commands, but also checks the state of the control directory, and creates a new experiment ID and updates the links to archive when it decides this is required.

If you’re unsure exactly how the new features work play around a bit, create a test experiment and use payu checkout -b <expname> to create a few experiments, see what payu does when a new experiment is created, and how to switch between them.

1 Like

I would ask @tiloz about this, so tagging him here.

2 Likes

Thanks @Aidan, I’ll have a play around. Just to clarify, to use the updated payu version, should I be running module use /g/data/vk83/modules and module load payu/1.1.3 (rather than module load conda/analysis3 which I currently do) every time I log onto gadi to run more years of an experiment, or start a new one?

Yes. But there are ways to make this easier. It is pretty safe to add something like this to your ~/.bash_profile

if in_interactive_shell && in_login_shell
then
  if $(command -v module > /dev/null 2>&1)
  then
    module use /g/data/vk83/modules 
    module load payu
  fi
fi

Which will mean payu is automatically available every time you log onto gadi.

If you don’t want to always have it loaded you can define an alias and then it is a single command to have payu available:

$ alias loadpayu='module use /g/data/vk83/modules && module load payu'
$ module list
Currently Loaded Modulefiles:
 1) pbs   2) dot   3) ncview/2.1.7   4) netcdf/4.7.3(default)   5) git/2.37.3  
$ loadpayu 
$ module list
Currently Loaded Modulefiles:
 1) pbs   2) dot   3) ncview/2.1.7   4) netcdf/4.7.3(default)   5) git/2.37.3   6) payu/1.1.3  

You can add that alias command to your ~/.bashrc and then loadpayu will always be available on gadi.

1 Like

Hi Hannah, great questions!

Some of the restart files in /g/data/access/payu/access-esm/restart/pre-industrial have unclear origins and they don’t appear to match the restart files from the original pre-industrial simulations. For reproducibility, it might be best to use the restarts in /g/data/vk83, which are copied from the original PI-02 simulation.

They’re a bit hard to find at the moment, but restart and history files for the CMIP6 experiments along with several non-CMIP experiments are stored in /g/data/p73/archive. There are two different ESM1.5 pre-industrial experiments stored there: PI-01 which was originally run on raijin, and PI-02 which was run on gadi. PI-01 isn’t reproducible on gadi, and so it’s probably worth using the restarts from PI-02 stored in /g/data/p73/archive/CMIP6/ACCESS-ESM1-5/PI-02.

There are a few intricacies! I’ll run through what worked for me in changing the restart year to 700. A few points aren’t completely clear to me still – I’ll update them if I find out anything more. If you or anyone notices anything that’s not quite right, let me know!

  1. Copy the relevant restart files from /g/data/p73/archive/CMIP6/ACCESS-ESM1-5/PI-02/restart (in this case, those ending with -07000101 or -06993112) to a new directory, e.g. <new-restart-dir>, separating them into atmosphere, ocean, ice, and coupler subdirectories.

  2. In the experiment’s config.yaml file, edit the restart option to point to <new-restart-dir>.

  3. Create symlinks: Each model will look for restart files which don’t have date-stamps at the end of their name. E.g. in <control-directory>/ocean/field_table , the ocean model will try to use:

    restart_file  = csiro_bgc.res.nc
    

    rather than csiro_bgc.res.nc-06993112. For each of the ocean restart files, this means we need to create a symlink to each file, omitting the datestamp from its name. A quick way to do this would be to navigate to new-restart-dir/ocean and run:

    $ for res_file in *
    > do
    > ln -s ${res_file} ${res_file%-*}
    > done
    

    The same can be done for the coupler restarts.

    For the ice restarts, symlinks only need to be created for ice.restart_file and mice.nc.

    For the atmosphere, a symlink should be made to the copied restart file PI-02.astart-07000101, though this time should be named restart_dump.astart.

  4. Next, we need to update each model’s namelist files in <control-directory>/{atmosphere,ocean,ice} to have the correct starting dates:

    Atmosphere: In <control-directory>/atmosphere/namelists, set

    MODEL_BASIS_TIME=  0700 , 01 , 01 , 0 , 0 , 0 ,
    ANCIL_REFTIME=  0700 , 01 , 01 , 0 , 0 , 0 ,
    

    Make the same change in <control-directory>/atmosphere/CNTLALL. This second file might be superfluous and we’re looking at removing it for the new release, but for now it’s probably safest to edit it alongside the namelists file.

    My understanding is that MODEL_BASIS_TIME sets the atmosphere’s starting date for the new simulation. The impact of ANCIL_REFTIME is a bit more intricate – it can shift forwards or back the days at which ancillary fields are updated which can influence reproducibility. It might not be stricly necessary, but it works out here to change the ANCIL_REFTIME to match the new MODEL_BASIS_TIME.

    Ocean: The ocean’s start times appear to be read in from the restart file ocean_solo.res:

    700     1     1     0     0     0        Current model time: year, month, day, hour, minute, second
    

    and so no changes need to be made to the ocean model’s namelists.

    Ice: The best way to handle the cice timing controls is still a bit unclear to me, as it is fairly messy.

    First, <control-directory>/ice contains two configuration files cice_in.nml and input_ice.nml, which influence different parts of the simulation timing. The relevant variables in input_ice.nml are:

    • inidate - the start date for the new simulation.
    • init_date - the original starting date for the whole simulation (e.g. 0001/01/01).
    • runtime0 - the total amount of time in seconds already simulated since the init_date.
    • runtime - the amount of time in seconds to be run in the next simulation.

    while the relevant variables in cice_in.nml are:

    • year_init- the original starting year for the simulation (should match the year in init_date)
    • istep0 - the total number of timesteps simulated so far.
    • npt - the number of timesteps to be run in the next simulation.

    In practice, payu will only pay attention to init_date and year_init, which should be left at 0010101 and 0001 respectively. For the other variables, payu replaces their values (see the next step),and it should be safe to leave them as is. It could be worthwhile to set inidate to the new starting date of 07000101 just for record keeping.

    5.Each time payu runs a simulation, it places updated versions of cice_in.nml and input_ice.nml in the latest restart directory, and fills them with timing information about the run that just finished.

    E.g. in restart003/ice/input_ice.nml, runtime0 will be the total amount of time, in seconds, between the simulation initialisation (init_date) and the start of run 003. Meanwhile runtime will be the number of seconds simulated during run 003. Likewise in restart003/ice/input_ice.nml, istep0 and npt will be the total number of timesteps simulated by before and during run 003 respectively.

    When run 004 starts, payu takes the information from the restart003/ice namelist files, notes how much time has passed (runtime0 + runtime) since the initialisation date, and uses the result to specify the start date of the next run, run 004. (Hence why the values in <control-dir>/ice are ignored).

    For starting a new run at year 700, this means <new-restart-dir>/ice must contain a copy of input_ice.nml which states that 699 years of simulation have already passed. /g/data/vk83/experiments/inputs/access-esm1p5/pre-industrial/restart/ice contains a version which does this for year 101 – copy this over to <new-restart-dir>/ice and replace its runtime0 value to equal the total number of seconds between the experiment initialisation date 0001-01-01 and the new start date 0700-0101, using the proleptic Gregorian calendar, i.e:

    <new-restart-dir>/input_ice.nml
    ------------------------------------------------------------
    &coupling
    runtime0=22058265600
    runtime=0
    /
    

    runtime should be left at 0 to ensure that runtime0 + runtime equals the correct value.

    In /g/data/vk83/experiments/inputs/access-esm1p5/pre-industrial/restart/ice, you’ll also notice a copy of cice_in.nml. Copy this over to <new-restart-dir>/ice too. It’s values will look inconsistent with the timing variables from the new version of input_ice.nml, e.g. in

    <new-restart-dir>/cice_in.nml
     ------------------------------------------------------------
    &setup_nml
    istep0=0,
    npt=0,
    dt=3600,
    /
    

    istep0 = 0 timesteps doesn’t match runtime0 = 22058265600. Due to quirks with payu’s handling of the ice calendar, it currently seems safest to leave this as is even though its inconsistent. Trying to make the two match can cause some immediate timing issues.

    6. new-restart-dir should now contain all the files required for the new run. Update the paths and filenames under <control-directory>/manifests/restart.yaml to point to the new files and symlinks.
    Edit: the payu setup command will automatically fill in the paths in the restart.yaml based on the files it finds in the specified restart directory, and so it shouldn’t be necessary to manually update this file.

    1. Everything should now be ready to run. Following the above and running for a few months, the simulation appears to match the original PI-02 simulation:
      Screenshot 2024-06-18 at 4.07.24 PM

    I haven’t trialed a long simulation yet, so I am hoping there aren’t any timing problems down the road. Hopefully this works for getting started though.

Let me know if you have any questions about those steps, or if anything doesn’t work when trying them out.

3 Likes

Thanks @spencerwong for these details and instructions. It’s super helpful, I really appreciate it! I’ve been caught up with other things, but trying this out is my to-do list this week. I’ll let you know how I go.

No worries! Step 6 turns out to be uneccessary as long as you run payu setup, and so I’ve just updated the notes.

1 Like

@spencerwong these instructions are legendary, thank you!!! I finally got around to changing the restart year last week and this worked perfectly for me, no hiccups at all! :raised_hands:

Also, @Aidan I’ve been using the access nri payu version and it’s working a charm. I like the branching feature for keeping track of different simulations, without having multiple run directories.

1 Like

Awesome! Thanks for the feedback @hrsdawson. It is really good to know.

Your hard work is being used and valued @jo-basevi!

@spencerwong what did you use to calculate the total number of seconds between the restart date and the initialisation date using the proleptic Gregorian calendar? I want to branch from other years of the piControl but am unsure I’ve got the correct runtime0 values.

Hi Hannah, I’m glad that the instructions worked!

I’ve been using the cftime library in python which I’ve found useful for calculations with different calendars:

>>> import cftime
>>> init_date = cftime.datetime(1,1,1,0,0,0, calendar = "proleptic_gregorian")
>>> end_date = cftime.datetime(101,1,1,0,0,0, calendar = "proleptic_gregorian")
>>> (end_date - start_date).total_seconds()
3155673600.0

I think this library should be available on gadi through either the vk83/payu module or otherwise the hh5 conda/analysis3 module.

Also as a side note – we’re hoping to simplify cice’s restart date handling for ESM1.5, as the current fairly convoluted form is quite difficult to work with and has previously led to bugs. Further down the line we’re hoping to make the whole process of setting up the model with a different restart a bit simpler. I’ll update the notes when those changes happen!

2 Likes

Thank you @spencerwong, thats great!!