Porting CSIRO/UMUI ACCESS-ESM1.5 ksh run script to payu

Many of the CMIP6 experiments for ACCESS-ESM1.5 were not run with Rose/CYLC or payu, but rather a series of heavily modified Korn shell (ksh) scripts that were originally generated by the old UMUI configuration program. In the future, payu is looking like the preferred way to run ACCESS-ESM1.5 because the integration with git, the input file manifests and the run history all help make running experiments easy, well documented and reproducible. It’s therefore highly desired that there is some way to port existing configurations of ACCESS that use the ksh scripts to payu. There are some existing payu configurations of CMIP6 experiments (pre-industrial, historical and ssp585), but none of them can exactly reproduce the results of the output from the ksh scripts (as far as I know).

I have attempted to do this for one of my own experiments simply because I’m familiar with them:

payu

ksh

If I’m successful at reproducing my results then I hope the procedure could be easily done for other experiments. So far, the configuration runs, but does not exactly reproduce my results. Reasons for this might be:

  • I missed something/made some mistake
  • There are configuration capabilities in payu that do not exist or are not explicit in the ksh scripts
  • There are configuration capabilities in the ksh scripts that do not exist in payu

Procedure

  1. I ran the ksh scripts my chosen experiment (not submitted to the queue). Most of the interesting configuration is done in this file. It produces a run directory with all of the required namelists for the experiment.
  2. Copy over the atnosphere configuration in Running.dir/ATM_RUNDIR/tmp_ctrl to the payu config. Payu lists the namelist and other configuration files it needs here. Some of these have slightly different names to the ksh script. I copied the files over to the relevant files. The file namelists has namelists from all files (so I’m not sure what the point of having all the other namelist files is…) I just made sure they are all consistent.
CNTLALL => CNTLALL
CNTLATM => prefix.CNTLATM
CNTLGEN => prefix.CNTLGEN
CONTCNTL => CONTCNTL
INITHIS => INITHIS
PRESM_A => prefix.PRESM_A
SIZES => SIZES
STASHC => STASHC
UAFILES_A => UAFILES_A
UAFLDS_A => UAFLDS_A
../cable.nml => cable.nml
../input_atm.nml => input_atm.nml
  1. I did the same thing as the above step for the ocean and sea ice files and namcouple
  2. I set the model executable in config.yaml using the ksh executable listed here for the atmosphere, ocean and ice models. Is there a different coupler executable? I couldn’t find this info in the ksh scripts.
  3. Set other config.yaml variables like jobname, calendar: start: year:
  4. This experiment warm-starts from an existing experiment, so I set the details of that in warm-start.sh (year 500 of PI-GWL-t6)
  5. This particular experiment had land-use change enabled. The ksh scripts inject the new land-use map into the restart file at the end of the year, ready for the next year when the job is resubmitted. payu, however, does this before the start of the year in scripts/pre.sh. I had already injected the new land-use map into the restart file, therefore I needed to modify the script to not do this in the first year. For exemple see here.
This step was specific to my experiment
  1. Point the warm-start-csrio.sh script to my custom restart file.
  1. Remove references to HISTORY. Some of the config files referenced a paths to HISTORY, which is not used by payu. I searched for these with grep and removed them.
  2. Add the ancil files to atmosphere/um_env.py. The ksh scripts conditionally set these based on the time period (I think…). In my case, most of these were the pre-industrial ancil files. There are some ancil files in um_env.py that I couldn’t find in the ksh scripts. Like ARCLBIOG? I ignored them
  3. Run a test and compare the log files to the original version. I used work/atmosphere/atm.fort6.pe0 compared to the equivalent Running.dir/ATM_RUNDIR/um_out/<EXPNAME>.fort6.pe0.

To compare results, I converted the output to netcdf with ACCESS-Archiver and did a cdo sub so I can see where the differences are. This takes a while to do. Is there any way to compare the UM binary output files directly?

Thanks for documenting this process @tammasloughran.

Just so others know and find the correct repo, all the CLEX CMS ACCESS-ESM1.5 experiments are in branches in this repo:

My recollection was that the payu versions did reproduce the original experiments, but my memory is terrible and I wasn’t directly involved in most of the verification. @holger and/or @Scott may be better placed to say for sure.

They would also be well placed to comment on the UM configuration.

Checking bitwise reproducibility is possible just by comparing the checksums in the UM logs. There is comment here from @MartinDix about this

which also mentions mule-cumf , which maybe answers your other question about comparing fields?

If you want ACCESS-NRI assistance with this add a help tag to the post and it’ll be picked up by the triage team.

Thanks Aidan,

I’m probably wrong on the other payu versions of standard simulations…

Howerver, no need for help anymore, I have reproduced the results with a payu configuration for my own, but I had quite a bit of confusion along the way. The namelist files are messy. Many of the namelist files are duplicates, so I think they should be cleaned up one day. Only the atmoshpere/namelists file was used as far as I could tell (and the cable one).

The ancillary files were also confusing. It seems payu spams symbolic links in the the run directory which made it hard to know what was being used. Sometimes it would use the ancil file I needed, other times not.

I had trouble using mule-cumf. It said the files from payu and the scripts were not comparable (maybe some difference in header info?), despite there being no numerical difference.

This wasn’t as stright-forward as I had hoped, but I will try doing it again in the near future.

Cheers,
Tam

Great! (that you managed to reproduce results)

Is this just for your case, or do the standard configs also contain spurious namelists?

I have more than a passing interest, as ACCESS-ESM1.5 is the next model for which ACCESS-NRI will be releasing supported configurations. So if you had any specific suggestions we’d love to hear them.

By design payu does not copy input files (ancils) or restarts, it makes symbolic links to them in the run directory. There are a number of reasons for this approach. A few that come to mind:

  • avoid wasteful (and potentially time-consuming) duplication
  • easily identify (and clean-up post run) inputs that do not need to be archived
  • separate configuration and “input”: good from a logical/design point of view, but also facilitates storing configuration, usually text files, in a version control system (git)

This is also very pertinent to us, as we’re going to need to do something similar. Do you have any more information about this? e.g. example files and/or command output?

Is this just for your case, or do the standard configs also contain spurious namelists?

I started with a standard payu configuration and I didn’t add any new namelist files in the atmosphere subdirectory. So I guess some of them are redundant. I didn’t test which ones.

This is also very pertinent to us, as we’re going to need to do something similar. Do you have any more information about this? e.g. example files and/or command output?

Sorry, I must have made some sort of mistake the first time I tried this. mule-cumf works well.

module load conda
mule-cumf /scratch/p66/tfl561/access-esm/archive/GWL-NoCrops-B2030/output001/atmosphere/aiihca.pae0dec /scratch/p66/tfl561/archive/GWL-NoCrops-B2030/history/atm/GWL-NoCrops-B2030.pa-0500012001
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* (CUMF-II) Module Information *
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

mule       : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/mule/__init__.py (version 2022.07.1)
um_utils   : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/um_utils/__init__.py (version 2022.07.1)
um_packing : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/um_packing/__init__.py (version 2022.07.1) (packing lib from SHUMlib: 2023061)


/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/mule/stashmaster.py:259: UserWarning: 
Unable to load STASHmaster from version string, path does not exist
Path: $UMDIR/vn7.3/ctldata/STASHmaster/STASHmaster_A
Please check that the value of mule.stashmaster.STASHMASTER_PATH_PATTERN is correct for your site/configuration
  warnings.warn(msg)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* CUMF-II Comparison Report *
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

File 1: /scratch/p66/tfl561/access-esm/archive/GWL-NoCrops-B2030/output001/atmosphere/aiihca.pae0dec
File 2: /scratch/p66/tfl561/archive/GWL-NoCrops-B2030/history/atm/GWL-NoCrops-B2030.pa-0500012001
Files compare
  * 0 differences in fixed_length_header (with 7 ignored indices)
  * 0 field differences, of which 0 are in data

Compared 3345/3345 fields, with 3345 matches

They way I originally compared them:

import numpy as np
import netCDF4 as nc
ncfile_a = nc.Dataset('/g/data/p66/tfl561/ACCESS-ESM/GWL-NoCrops-B2030_payu_test/history/atm/netCDF/GWL-NoCrops-B2030.pa-050012_mon.nc', 'r')
ncfile_b = nc.Dataset('/g/data/p66/tfl561/ACCESS-ESM/GWL-NoCrops-B2030_csiro_test/history/atm/netCDF/GWL-NoCrops-B2030.pa-050012_mon.nc', 'r')
for var in ncfile_a.variables.keys():
   difference = ncfile_a.variables[var][:] - ncfile_b.variables[var][:]
   print(var, np.any(difference))
fld_s00i004 False
latitude_longitude --
time False
time_bnds False
model_theta_level_number False
lat False
lat_bnds False
lon False
lon_bnds False
theta_level_height False
theta_level_height_bnds False
sigma_theta False
sigma_theta_bnds False
surface_altitude False
fld_s00i010 False
fld_s00i023 False
fld_s00i024 False
fld_s00i030 False
fld_s00i031 False
fld_s00i032 False
fld_s00i033 False
fld_s00i103 False
fld_s00i104 False
fld_s00i105 False
fld_s00i108 False
fld_s00i109 False
fld_s00i110 False
fld_s00i111 False
fld_s00i112 False
fld_s00i113 False
fld_s00i114 False
fld_s00i115 False
fld_s00i116 False
fld_s00i250 False
fld_s00i251 False
fld_s00i252 False
fld_s00i407 False
model_rho_level_number False
rho_level_height False
rho_level_height_bnds False
sigma_rho False
sigma_rho_bnds False
fld_s00i408 False
fld_s00i409 False
fld_s00i431 False
fld_s00i432 False
fld_s00i433 False
fld_s00i434 False
fld_s00i435 False
fld_s00i436 False
fld_s00i507 False
fld_s00i508 False
fld_s00i509 False
fld_s01i201 False
fld_s01i207 False
fld_s01i208 False
fld_s01i209 False
fld_s01i210 False
fld_s01i211 False
fld_s01i223 False
fld_s01i235 False
fld_s01i241 False
fld_s01i247 False
fld_s01i248 False
fld_s02i201 False
fld_s02i203 False
fld_s02i204 False
fld_s02i205 False
fld_s02i206 False
fld_s02i207 False
fld_s02i208 False
fld_s02i261 False
fld_s02i284 False
pseudo_level False
fld_s02i285 False
fld_s02i286 False
fld_s02i287 False
fld_s02i288 False
fld_s02i289 False
fld_s02i295 False
fld_s02i308 False
fld_s02i309 False
fld_s02i310 False
fld_s02i311 False
fld_s03i049 False
fld_s03i100 False
fld_s03i101 False
fld_s03i173 False
pseudo_level_0 False
fld_s03i201 False
fld_s03i209 False
lon_u False
lon_u_bnds False
height False
fld_s03i210 False
lat_v False
lat_v_bnds False
fld_s03i217 False
fld_s03i223 False
fld_s03i225 False
fld_s03i226 False
fld_s03i227 False
fld_s03i229 False
fld_s03i230 False
fld_s03i234 False
fld_s03i236 False
height_0 False
fld_s03i237 False
fld_s03i245 False
fld_s03i256 False
pseudo_level_1 False
fld_s03i257 False
fld_s03i258 False
fld_s03i261 False
fld_s03i262 False
fld_s03i263 False
fld_s03i287 False
fld_s03i288 False
fld_s03i289 False
pseudo_level_2 False
fld_s03i291 False
fld_s03i292 False
fld_s03i293 False
fld_s03i296 False
fld_s03i297 False
fld_s03i298 False
fld_s03i314 False
fld_s03i317 False
fld_s03i318 False
fld_s03i319 False
fld_s03i321 False
fld_s03i326 False
fld_s03i327 False
fld_s03i331 False
fld_s03i332 False
fld_s03i353 False
fld_s03i395 False
fld_s03i460 False
fld_s03i461 False
fld_s03i801 False
fld_s03i802 False
fld_s03i803 False
fld_s03i804 False
fld_s03i805 False
fld_s03i806 False
fld_s03i807 False
fld_s03i808 False
fld_s03i809 False
fld_s03i810 False
fld_s03i811 False
fld_s03i812 False
fld_s03i813 False
fld_s03i814 False
fld_s03i815 False
fld_s03i816 False
fld_s03i817 False
fld_s03i818 False
fld_s03i819 False
fld_s03i820 False
fld_s03i821 False
fld_s03i822 False
fld_s03i823 False
fld_s03i824 False
fld_s03i825 False
fld_s03i826 False
fld_s03i827 False
fld_s03i828 False
fld_s03i829 False
fld_s03i830 False
fld_s03i831 False
fld_s03i832 False
fld_s03i851 False
time_0 False
fld_s03i852 False
fld_s03i853 False
fld_s03i854 False
fld_s03i855 False
fld_s03i856 False
fld_s03i857 False
fld_s03i858 False
fld_s03i859 False
fld_s03i860 False
fld_s03i861 False
fld_s03i862 False
fld_s03i863 False
fld_s03i864 False
fld_s03i865 False
fld_s03i866 False
fld_s03i867 False
fld_s03i868 False
fld_s03i869 False
fld_s03i870 False
fld_s03i871 False
fld_s03i872 False
fld_s03i873 False
fld_s03i874 False
fld_s03i875 False
fld_s03i876 False
fld_s03i877 False
fld_s03i878 False
fld_s03i879 False
fld_s03i880 False
fld_s03i881 False
fld_s03i882 False
fld_s03i884 False
fld_s03i885 False
fld_s03i893 False
fld_s03i917 False
fld_s03i918 False
fld_s03i919 False
fld_s03i920 False
fld_s04i203 False
fld_s04i204 False
fld_s05i205 False
fld_s05i206 False
fld_s05i208 False
fld_s05i214 False
fld_s05i215 False
fld_s05i216 False
fld_s05i222 False
fld_s05i250 False
fld_s05i251 False
fld_s05i270 False
fld_s08i023 False
fld_s08i202 False
fld_s08i208 False
fld_s08i223 False
soil_model_level_number False
fld_s08i225 False
fld_s08i229 False
fld_s08i230 False
fld_s08i231 False
fld_s08i233 False
fld_s08i234 False
fld_s08i235 False
fld_s08i236 False
fld_s08i237 False
fld_s08i248 False
fld_s15i101 False
fld_s16i222 False
fld_s17i205 False
fld_s17i220 False
fld_s17i221 False
fld_s17i232 False
fld_s17i233 False
fld_s30i201 False
pressure False
fld_s30i202 False
fld_s30i203 False
fld_s30i204 False
fld_s30i205 False
fld_s30i206 False
fld_s30i207 False
fld_s30i208 False
fld_s30i215 False
fld_s30i225 False
fld_s30i301 False
fld_s30i403 False
fld_s30i404 False
fld_s30i405 False
fld_s30i406 False
fld_s30i428 False
fld_s30i429 False
fld_s30i453 False
fld_s33i001 False
fld_s33i002 False

latitude_longitude is a masked value in both files.

1 Like

Thanks @tammasloughran

Good to hear mule-cumf is working as expected.