ACCESS-OM2 payu tutorial

Introduction

This is a tutorial to introduce new users to payu, and explain new features of payu to experienced users, particularly how those changes might affect their existing workflow. payu is a scientific workflow management tool written in python and developed by Marshall Ward for use at NCI.

Specifically this tutorial covers using ACCESS-OM2 experiment configurations supported by ACCESS-NRI.

The tutorial begins with some general information about how payu works, highlighting some of the new features. Then there is a section with a number of workflows to help users identify what information is most appropriate to them.

ACCESS-NRI provides a version of payu installed on gadi that supports the features described in this tutorial. To access it you need to be a member of vk83 and then run the following commands:

module use /g/data/vk83/modules
module load payu

Experiments

Configurations

ACCESS-OM2 configurations are all available from the ACCESS-OM2 Configs repository.

All released configurations are available as branches in the configs repo. They are organised like this for administrative convenience: they are easier to manage and keep grouped for testing. Anyone using a configuration is advised to just clone a single branch and not attempt to keep this structure.

Branches

git branches are explicitly supported by payu>=1.1, and form a crucial part of the updated workflow. This allows a single control directory (which is a git repository) to contain multiple independent experiments, and it is possible to switch between experiments, though only one experiment can be active and running at any one time.

payu combines the directory name, the branch name and the first 8 digits of the experiment UUID to generate unique directory names for the archive and work directories of an experiment.

It is also necessary to update the symbolic links to archive in the control directory when switching between experiment branches. For this reason, and others, there is now a payu checkout command, which wraps the git checkout command, but also takes care of other housekeeping to make sure the control directory is correct. Using git checkout directly is not recommended as it could lead to confusing or incorrect experiment configuration.

Experiment UUIDs

payu automatically generates unique experiment UUIDs. These are UUIDv4 format, which is typically represented as a 128bit hexadecimal number, e.g.

550e8400-e29b-41d4-a716-446655440000

They are guaranteed to be unique within any reasonable computation effort, and so can be confidently used to identify and track experiments. UUIDs are not human friendly, and are designed to be used by software and in databases, but the first 8 digits of the experiment UUID is used to uniquely name experiment laboratory archive and work directories as described above.

Experiment naming

An experiment name is used to identify the experiment inside the work and archive sub-directories in the laboratory.

The experiment name historically would default to the name of the control directory. This is still supported for experiments with pre-existing archived outputs. To support git branches and ensure uniqueness in shared archives, the new default behaviour is to add the branch name and a short version of the experiment UUID to the name of the control directory when creating experiment names.

For example, given a control directory named my_expt and a UUID of 416af8c6-d299-4ee6-9d77-4aefa8a9ebcb, the experiment name would be:

  • my_expt-perturb-416af8c6 - if running an experiment on a branch named perturb.
  • my_expt-416af8c6 - if the control directory was not a git repository or experiment was run from the main or master git branch.

To preserve backwards compatibility, if there’s a pre-existing archive under the control directory name, this will remain the experiment name (e.g. my_expt in the above example). Similarly, if the experiment value is configured (see Configuring your experiment), this will be used for the experiment name.

Cloning an experiment configuration

Choose the experiment you need and clone the corresponding branch from the repository to a directory name that reflects the experiment chosen.

A new branch should be created for the specific experiment being run, and its name should relate to the experiment. e.g. ctrl if the run is control experiment.

The best way to clone an experiment configuration is to use the payu clone command. This wraps the git clone command and provides support for cloning a branch to a new branch name:

payu clone -B <branch> -b <new_branch> git@github.com:ACCESS-NRI/access-om2-configs.git <experiment_name>

See workflows below for some examples.

Syncing

payu now supports syncing of an experiment archive to another filesystem, either local or remote. There are a number of configuration options to customise what is sync’ed and when, but a simple example might be

sync:
	enable: True
	path: /g/data/xx00/aa9999/experiments/

Note: care must be taken with the path as syncing has the potential to overwrite files.

With sync enabled payu will automatically copy the experiment outputs, restarts and the git control repository to the defined path.

With sync configured you can manually sync using the payu sync command. Using payu sync --sync-restarts will syncs all output and restart directories. This is particularly useful when a run is finished, or temporarily halted for some time, to make sure all outputs and restarts have been copied to non-ephemeral storage.

Restart pruning

payu now supports specifying which restarts to retain using a date-based frequencies. This allows restarts pruning based on time units. The supported time units are:

  • YS - year-start
  • MS - month-start
  • W - week
  • D - day
  • H - hour
  • T - minute
  • S - second

For example setting

restart_freq: 5YS

will only save the first restart of every fifth year, with the rest deleted.

The sync functionality knows which restarts will be pruned if restart_freq is set, and does not sync restarts that will later be deleted.

Date based restart_freq is currently only supported for ACCESS-OM2, MOM5, and MOM6 models.

Common Workflows

A. Starting from scratch, running a single experiment

Select Experiment

Select experiment from a release branch of the access-om2-configs repo. e.g. release-1deg_jra55_ryf

Clone Experiment

Clone experiment branch into directory named for the experiment. Optionally (but good idea), create a new branch with a more appropriate name in the same step.
e.g.

payu clone -B release-1deg_jra55_ryf -b ctrl git@github.com:ACCESS-NRI/access-om2-configs.git 1deg_jra55_ryf

This creates a directory 1deg_jra55_ryf with the current branch named ctrl

See details
$ cd 1deg_jra55_ryf
$ ls -g
total 56
-rw-r----- 1 tm70  861 Jan 24 22:52 accessom2.nml
lrwxrwxrwx 1 tm70   68 Jan 30 23:38 archive -> /scratch/tm70/xx9999/access-om2/archive/1deg_jra55_ryf-ctrl-6d94da1f
drwxr-s--- 2 tm70 4096 Jan 24 22:52 atmosphere
-rw-r----- 1 tm70 4848 Jan 24 22:52 config.yaml
drwxr-s--- 2 tm70 4096 Jan 24 22:52 doc
drwxr-s--- 2 tm70 4096 Jan 24 22:52 ice
drwxr-s--- 2 tm70 4096 Jan 24 22:52 manifests
-rw-r----- 1 tm70 2570 Jan 30 23:38 metadata.yaml
-rw-r----- 1 tm70 7904 Jan 24 22:52 namcouple
drwxr-s--- 2 tm70 4096 Jan 24 22:52 ocean
-rw-r----- 1 tm70 1367 Jan 24 22:52 README.md
-rwxr-x--- 1 tm70 1389 Jan 24 22:52 resub.sh
$ payu branch
* Current Branch: ctrl
    experiment_uuid: 6d94da1f-cba3-4dd9-9b80-f521cb96197a
Branch: release-1deg_jra55_ryf
    No UUID in metadata file

payu has generated an experiment_uuid (6d94da1f-cba3-4dd9-9b80-f521cb96197a) and saved it into the metadata.yaml file.

Note the archive link points to a directory named 1deg_jra55_ryf-ctrl-6d94da1f. This is the automatically generated experiment name that combines the base experiment name (the directory name) joined with the branch name and the first 8 digits of the commit hash.

Run Experiment

In this example the experiment is run (payu run) with an unchanged configuration.

Things to avoid

Don’t clone the access-om2-configs repo without renaming it.
Don’t then checkout an experiment branch and payu run.

This will result in an experiment name that would look something like access-om2-configs-1deg_jra55_ryf-3af3cd6e9, and no-one wants that.

B. Linked experiments

payu now supports running independent experiments from the same repository (directory). Which is a good fit for closely related experiments.

This workflow will go through the steps to make a new related experiment from the previous example.

Checkout new experiment

Use payu checkout to create a new branch named for the experiment and switch to it, e.g.
for a positive wind perturbation branching from the a specific commit (3f4cd6e1)

payu checkout -b wind_plus 3f4cd6e1
Show command output
Created and checked out new branch: wind_plus 
laboratory path:  /scratch/tm70/xx9999/access-om2
binary path:  /scratch/tm70/xx9999/access-om2/bin               
input path:  /scratch/tm70/xx9999/access-om2/input      
work path:  /scratch/tm70/xx9999/access-om2/work    
archive path:  /scratch/tm70/xx9999/access-om2/archive
Updated metadata. Experiment UUID: d44a14b5-2d2e-4b3f-916c-ccc5783878f7
Removed archive symlink to /scratch/tm70/xx9999/access-om2/archive/1deg_jra55_ryf-ctrl-6d94da1f                                                           
Added archive symlink to /scratch/tm70/xx9999/access-om2/archive/1deg_jra55_ryf-wind_plus-d44a14b5 

Note the UUID has changed and the path to the archive has too:

$ ls -g archive
lrwxrwxrwx 1 tm70 73 Jan 31 17:29 archive -> /scratch/tm70/xx9999/access-om2/archive/1deg_jra55_ryf-wind_plus-d44a14b5

The experiment name has changed to include the branch name and a portion of the experiment UUID. The metadata.yaml file has been changed accordingly:

$ grep ^experiment_uuid metadata.yaml 
experiment_uuid: d44a14b5-2d2e-4b3f-916c-ccc5783878f7

Modify experiment configuration

Modify the configuration to achieve the positive wind perturbation (details beyond the scope of this tutorial)

Run experiment

With the configuration altered it is always a good idea to to payu setup to check the changes have been done correctly.

Once satisfied that all is correct payu sweep to remove the work directory and payu run (or payu run -f to do the sweep automatically).

To view the available experiments run payu branch, and it will list the current branch, and all other branches, listing information about the experiments they contain (if they do so)

$ payu branch
* Current Branch: wind_plus
    experiment_uuid: d44a14b5-2d2e-4b3f-916c-ccc5783878f7
Branch: ctrl
    experiment_uuid: 6d94da1f-cba3-4dd9-9b80-f521cb96197a
Branch: release-1deg_jra55_ryf
    No UUID in metadata file

Create another new experiment

To create another related experiment branching from the same git commit as above follow the same steps but use a different branch name e.g.

payu checkout -b wind_minus 3f4cd6e1
Show command output
Created and checked out new branch: wind_minus
laboratory path:  /scratch/tm70/xx9999/access-om2
binary path:  /scratch/tm70/xx9999/access-om2/bin
input path:  /scratch/tm70/xx9999/access-om2/input
work path:  /scratch/tm70/xx9999/access-om2/work
archive path:  /scratch/tm70/xx9999/access-om2/archive
Updated metadata. Experiment UUID: 99b6b92e-3b41-440d-928f-056da8da7777
Removed archive symlink to /scratch/tm70/xx9999/access-om2/archive/1deg_jra55_ryf-wind_plus-d44a14b5
Added archive symlink to /scratch/tm70/xx9999/access-om2/archive/1deg_jra55_ryf-wind_minus-99b6b92e

The archive symbolic link is modified to point to the new experiment directory (1deg_jra55_ryf-wind_minus-99b6b92e).

Modify experiment configuration

As before modify the configuration for the negative wind perturbation and then payu setup and when ready payu run -f.

Now payu branch shows the 3 branches with experiment UUIDs that were created above:

$ payu branch
* Current Branch: wind_minus
    experiment_uuid: 99b6b92e-3b41-440d-928f-056da8da7777
Branch: ctrl
    experiment_uuid: 6d94da1f-cba3-4dd9-9b80-f521cb96197a
Branch: release-1deg_jra55_ryf
    No UUID in metadata file
Branch: wind_plus
    experiment_uuid: d44a14b5-2d2e-4b3f-916c-ccc5783878f7

Any of those experiments can be accessed (payu checkout), analysed or continued by using payu checkout <branchname>. Note that only one experiment can be accessed and run at a time. This is a limitation of the branching model.

C. Simultaneous linked experiments

If you need to run experiments simultaneously they will have to be in separate control directories. It still possible to have all the experiments in a single git repo, but the repo would need to be cloned to separate directories to run the experiments simultaneously.

Using the example from above, you can set up independent experiments in a number of ways. A couple of examples are shown below. In both examples the goal is to keep the experiment name the same as it would be if the experiments were run as branches in the same repository.

Subdirectories

One option is to clone the same experiment into subdirectories, e.g.

$ payu clone -B release-1deg_jra55_ryf -b plus git@github.com:ACCESS-NRI/access-om2-configs.git plus/1deg_jra55_ryf
$ payu clone -B release-1deg_jra55_ryf -b minus git@github.com:ACCESS-NRI/access-om2-configs.git minus/1deg_jra55_ryf

It doesn’t matter how the subdirectories are named, only the name of the control directory itself matters, as this is determines the experiment name.

With versions of payu < 1.1 it is not be possible to have experiments with the same control directory, but the latest version of payu also adds a branch name and Experiment UUID, so the control directory can be identical.

Experiment name

Another option is to clone the experiment into uniquely named directories, but set the experiment name in config.yaml. e.g.

$ payu clone -B release-1deg_jra55_ryf -b plus git@github.com:ACCESS-NRI/access-om2-configs.git 1deg_jra55_ryf_plus
$ payu clone -B release-1deg_jra55_ryf -b minus git@github.com:ACCESS-NRI/access-om2-configs.git 1deg_jra55_ryf_minus

then add the following to config.yaml in both experiments:

experiment: 1deg_jra55_ryf

This overrides the default experiment name root, which is the control directory name and would result in the same naming scheme as the subdirectories above.

Why?

Why use a single repo with branches in this case if we’re making separate control directories? There is still value in having a single repository with related experiments in named branches. At any point the branches in the separate control directories can be pushed back to a single repository. This reduces the number of repositories, which can become a serious issue with many experiments. Having related experiments in the same repository also allows for easy comparisons between the configurations, and also makes it easier to apply common changes to a number of related experiments.

D. Existing experiments

Legacy mode (no changes to layout)

The new branching features of payu are completely backwards compatible. If an existing experiment has experiment output in archive it will make no change to the experiment name, or the archive link. The first time any payu command is run it will generate an experiment UUID and add it to metadata.yaml, and create metadata.yaml if necessary.

payu will tell you what it is doing. This is part of the output when payu was run in an existing experiment:

/g/data/vk83/apps/payu/1.1/lib/python3.9/site-packages/payu/metadata.py:130: MetadataWarning: No experiment uuid found in metadata. Generating a new uuid
  warnings.warn("No experiment uuid found in metadata. "
Pre-existing archive found at: /scratch/tm70/xx9999/access-om2/archive/1deg_jra55_ryf. Experiment name will remain: 1deg_jra55_ryf
Updated metadata. Experiment UUID: 05e1648d-9997-4eec-b2e1-ebb3881c95b4

It says that a pre-existing archive was found, and that the experiment name is unchanged, but an experiment UUID has been generated and the metadata updated:

experiment_uuid: 05e1648d-9997-4eec-b2e1-ebb3881c95b4

No branching

For an experiment that does not have existing experiment output, but also doesn’t use special branch names, so the branch is either main or master, the experiment name does not include the bench name. It will only be the control directory name and the shortened experiment UUID. e.g. cloning the previous version of an experiment and running payu

Show details of example commands
$ gh repo clone COSIMA/1deg_jra55_ryf 1deg_jra55_ryf_existing
Cloning into '1deg_jra55_ryf_existing'...                                                                                                                  
remote: Enumerating objects: 1132, done.                                                                                                                   
remote: Counting objects: 100% (360/360), done.                                                                                                            
remote: Compressing objects: 100% (64/64), done.                                                                                                           
remote: Total 1132 (delta 312), reused 319 (delta 296), pack-reused 772                                                                                    
Receiving objects: 100% (1132/1132), 366.44 KiB | 887.00 KiB/s, done.                                                                                      
Resolving deltas: 100% (694/694), done.
$ cd 1deg_jra55_ryf_existing
$ payu sweep
laboratory path:  /scratch/tm70/xx9999/access-om2     
binary path:  /scratch/tm70/xx9999/access-om2/bin
input path:  /scratch/tm70/xx9999/access-om2/input
work path:  /scratch/tm70/xx9999/access-om2/work
archive path:  /scratch/tm70/xx9999/access-om2/archive
/g/data/vk83/apps/payu/1.1/lib/python3.9/site-packages/payu/metadata.py:130: MetadataWarning: No experiment uuid found in metadata. Generating a new uuid
  warnings.warn("No experiment uuid found in metadata. "
/g/data/vk83/apps/payu/1.1/lib/python3.9/site-packages/payu/metadata.py:192: MetadataWarning: No pre-existing archive found. Generating a new uuid
  warnings.warn(
Updated metadata. Experiment UUID: 40a219d7-6011-4004-9aeb-d3f93a0fb81e

results in the following changes/additions to metadata.yaml:

experiment_uuid: 40a219d7-6011-4004-9aeb-d3f93a0fb81e
name: 1deg_jra55_ryf_existing-40a219d7

The rationale for this approach is to have as few surprises as possible from the way payu has worked in the past when running an existing experiment, or what appears to be in “legacy mode”.