Introduction
This is a tutorial to introduce new users to payu
, and explain new features of payu
to experienced users, particularly how those changes might affect their existing workflow. payu
is a scientific workflow management tool written in python and developed by Marshall Ward for use at NCI.
Specifically this tutorial covers using ACCESS-OM2 experiment configurations supported by ACCESS-NRI.
The tutorial begins with some general information about how payu
works, highlighting some of the new features. Then there is a section with a number of workflows to help users identify what information is most appropriate to them.
ACCESS-NRI provides a version of payu
installed on gadi
that supports the features described in this tutorial. To access it you need to be a member of vk83
and then run the following commands:
module use /g/data/vk83/modules
module load payu
Experiments
Configurations
ACCESS-OM2 configurations are all available from the ACCESS-OM2 Configs repository.
All released configurations are available as branches in the configs repo. They are organised like this for administrative convenience: they are easier to manage and keep grouped for testing. Anyone using a configuration is advised to just clone a single branch and not attempt to keep this structure.
Branches
git
branches are explicitly supported by payu>=1.1
, and form a crucial part of the updated workflow. This allows a single control
directory (which is a git
repository) to contain multiple independent experiments, and it is possible to switch between experiments, though only one experiment can be active and running at any one time.
payu
combines the directory name, the branch name and the first 8 digits of the experiment UUID to generate unique directory names for the archive
and work
directories of an experiment.
It is also necessary to update the symbolic links to archive
in the control
directory when switching between experiment branches. For this reason, and others, there is now a payu checkout
command, which wraps the git checkout
command, but also takes care of other housekeeping to make sure the control
directory is correct. Using git checkout
directly is not recommended as it could lead to confusing or incorrect experiment configuration.
Experiment UUIDs
payu
automatically generates unique experiment UUIDs. These are UUIDv4 format, which is typically represented as a 128bit hexadecimal number, e.g.
550e8400-e29b-41d4-a716-446655440000
They are guaranteed to be unique within any reasonable computation effort, and so can be confidently used to identify and track experiments. UUIDs are not human friendly, and are designed to be used by software and in databases, but the first 8 digits of the experiment UUID is used to uniquely name experiment laboratory archive
and work
directories as described above.
Experiment naming
An experiment name is used to identify the experiment inside the work
and archive
sub-directories in the laboratory.
The experiment name historically would default to the name of the control directory. This is still supported for experiments with pre-existing archived outputs. To support git branches and ensure uniqueness in shared archives, the new default behaviour is to add the branch name and a short version of the experiment UUID to the name of the control directory when creating experiment names.
For example, given a control directory named my_expt
and a UUID of 416af8c6-d299-4ee6-9d77-4aefa8a9ebcb
, the experiment name would be:
my_expt-perturb-416af8c6
- if running an experiment on a branch namedperturb
.my_expt-416af8c6
- if the control directory was not a git repository or experiment was run from themain
ormaster
git branch.
To preserve backwards compatibility, if there’s a pre-existing archive under the control directory name, this will remain the experiment name (e.g. my_expt
in the above example). Similarly, if the experiment
value is configured (see Configuring your experiment), this will be used for the experiment name.
Cloning an experiment configuration
Choose the experiment you need and clone the corresponding branch from the repository to a directory name that reflects the experiment chosen.
A new branch should be created for the specific experiment being run, and its name should relate to the experiment. e.g. ctrl
if the run is control experiment.
The best way to clone an experiment configuration is to use the payu clone
command. This wraps the git clone
command and provides support for cloning a branch to a new branch name:
payu clone -B <branch> -b <new_branch> git@github.com:ACCESS-NRI/access-om2-configs.git <experiment_name>
See workflows below for some examples.
Syncing
payu
now supports syncing of an experiment archive to another filesystem, either local or remote. There are a number of configuration options to customise what is sync’ed and when, but a simple example might be
sync:
enable: True
path: /g/data/xx00/aa9999/experiments/
Note: care must be taken with the path
as syncing has the potential to overwrite files.
With sync
enabled payu
will automatically copy the experiment outputs, restarts and the git
control repository to the defined path
.
With sync
configured you can manually sync using the payu sync
command. Using payu sync --sync-restarts
will syncs all output and restart directories. This is particularly useful when a run is finished, or temporarily halted for some time, to make sure all outputs and restarts have been copied to non-ephemeral storage.
Restart pruning
payu
now supports specifying which restarts to retain using a date-based frequencies. This allows restarts pruning based on time units. The supported time units are:
YS
- year-startMS
- month-startW
- weekD
- dayH
- hourT
- minuteS
- second
For example setting
restart_freq: 5YS
will only save the first restart of every fifth year, with the rest deleted.
The sync
functionality knows which restarts will be pruned if restart_freq
is set, and does not sync restarts that will later be deleted.
Date based restart_freq
is currently only supported for ACCESS-OM2, MOM5, and MOM6 models.
Common Workflows
A. Starting from scratch, running a single experiment
Select Experiment
Select experiment from a release branch of the access-om2-configs repo. e.g. release-1deg_jra55_ryf
Clone Experiment
Clone experiment branch into directory named for the experiment. Optionally (but good idea), create a new branch with a more appropriate name in the same step.
e.g.
payu clone -B release-1deg_jra55_ryf -b ctrl git@github.com:ACCESS-NRI/access-om2-configs.git 1deg_jra55_ryf
This creates a directory 1deg_jra55_ryf
with the current branch named ctrl
See details
$ cd 1deg_jra55_ryf
$ ls -g
total 56
-rw-r----- 1 tm70 861 Jan 24 22:52 accessom2.nml
lrwxrwxrwx 1 tm70 68 Jan 30 23:38 archive -> /scratch/tm70/xx9999/access-om2/archive/1deg_jra55_ryf-ctrl-6d94da1f
drwxr-s--- 2 tm70 4096 Jan 24 22:52 atmosphere
-rw-r----- 1 tm70 4848 Jan 24 22:52 config.yaml
drwxr-s--- 2 tm70 4096 Jan 24 22:52 doc
drwxr-s--- 2 tm70 4096 Jan 24 22:52 ice
drwxr-s--- 2 tm70 4096 Jan 24 22:52 manifests
-rw-r----- 1 tm70 2570 Jan 30 23:38 metadata.yaml
-rw-r----- 1 tm70 7904 Jan 24 22:52 namcouple
drwxr-s--- 2 tm70 4096 Jan 24 22:52 ocean
-rw-r----- 1 tm70 1367 Jan 24 22:52 README.md
-rwxr-x--- 1 tm70 1389 Jan 24 22:52 resub.sh
$ payu branch
* Current Branch: ctrl
experiment_uuid: 6d94da1f-cba3-4dd9-9b80-f521cb96197a
Branch: release-1deg_jra55_ryf
No UUID in metadata file
payu
has generated an experiment_uuid
(6d94da1f-cba3-4dd9-9b80-f521cb96197a
) and saved it into the metadata.yaml
file.
Note the archive
link points to a directory named 1deg_jra55_ryf-ctrl-6d94da1f
. This is the automatically generated experiment name that combines the base experiment name (the directory name) joined with the branch name and the first 8 digits of the commit hash.
Run Experiment
In this example the experiment is run (payu run
) with an unchanged configuration.
Things to avoid
Don’t clone the access-om2-configs
repo without renaming it.
Don’t then checkout an experiment branch and payu run
.
This will result in an experiment name that would look something like access-om2-configs-1deg_jra55_ryf-3af3cd6e9
, and no-one wants that.
B. Linked experiments
payu
now supports running independent experiments from the same repository (directory). Which is a good fit for closely related experiments.
This workflow will go through the steps to make a new related experiment from the previous example.
Checkout new experiment
Use payu checkout
to create a new branch named for the experiment and switch to it, e.g.
for a positive wind perturbation branching from the a specific commit (3f4cd6e1
)
payu checkout -b wind_plus 3f4cd6e1
Show command output
Created and checked out new branch: wind_plus
laboratory path: /scratch/tm70/xx9999/access-om2
binary path: /scratch/tm70/xx9999/access-om2/bin
input path: /scratch/tm70/xx9999/access-om2/input
work path: /scratch/tm70/xx9999/access-om2/work
archive path: /scratch/tm70/xx9999/access-om2/archive
Updated metadata. Experiment UUID: d44a14b5-2d2e-4b3f-916c-ccc5783878f7
Removed archive symlink to /scratch/tm70/xx9999/access-om2/archive/1deg_jra55_ryf-ctrl-6d94da1f
Added archive symlink to /scratch/tm70/xx9999/access-om2/archive/1deg_jra55_ryf-wind_plus-d44a14b5
Note the UUID has changed and the path to the archive has too:
$ ls -g archive
lrwxrwxrwx 1 tm70 73 Jan 31 17:29 archive -> /scratch/tm70/xx9999/access-om2/archive/1deg_jra55_ryf-wind_plus-d44a14b5
The experiment name has changed to include the branch name and a portion of the experiment UUID. The metadata.yaml
file has been changed accordingly:
$ grep ^experiment_uuid metadata.yaml
experiment_uuid: d44a14b5-2d2e-4b3f-916c-ccc5783878f7
Modify experiment configuration
Modify the configuration to achieve the positive wind perturbation (details beyond the scope of this tutorial)
Run experiment
With the configuration altered it is always a good idea to to payu setup
to check the changes have been done correctly.
Once satisfied that all is correct payu sweep
to remove the work
directory and payu run
(or payu run -f
to do the sweep
automatically).
To view the available experiments run payu branch
, and it will list the current branch, and all other branches, listing information about the experiments they contain (if they do so)
$ payu branch
* Current Branch: wind_plus
experiment_uuid: d44a14b5-2d2e-4b3f-916c-ccc5783878f7
Branch: ctrl
experiment_uuid: 6d94da1f-cba3-4dd9-9b80-f521cb96197a
Branch: release-1deg_jra55_ryf
No UUID in metadata file
Create another new experiment
To create another related experiment branching from the same git
commit as above follow the same steps but use a different branch name e.g.
payu checkout -b wind_minus 3f4cd6e1
Show command output
Created and checked out new branch: wind_minus
laboratory path: /scratch/tm70/xx9999/access-om2
binary path: /scratch/tm70/xx9999/access-om2/bin
input path: /scratch/tm70/xx9999/access-om2/input
work path: /scratch/tm70/xx9999/access-om2/work
archive path: /scratch/tm70/xx9999/access-om2/archive
Updated metadata. Experiment UUID: 99b6b92e-3b41-440d-928f-056da8da7777
Removed archive symlink to /scratch/tm70/xx9999/access-om2/archive/1deg_jra55_ryf-wind_plus-d44a14b5
Added archive symlink to /scratch/tm70/xx9999/access-om2/archive/1deg_jra55_ryf-wind_minus-99b6b92e
The archive symbolic link is modified to point to the new experiment directory (1deg_jra55_ryf-wind_minus-99b6b92e
).
Modify experiment configuration
As before modify the configuration for the negative wind perturbation and then payu setup
and when ready payu run -f
.
Now payu branch
shows the 3 branches with experiment UUIDs that were created above:
$ payu branch
* Current Branch: wind_minus
experiment_uuid: 99b6b92e-3b41-440d-928f-056da8da7777
Branch: ctrl
experiment_uuid: 6d94da1f-cba3-4dd9-9b80-f521cb96197a
Branch: release-1deg_jra55_ryf
No UUID in metadata file
Branch: wind_plus
experiment_uuid: d44a14b5-2d2e-4b3f-916c-ccc5783878f7
Any of those experiments can be accessed (payu checkout
), analysed or continued by using payu checkout <branchname>
. Note that only one experiment can be accessed and run at a time. This is a limitation of the branching model.
C. Simultaneous linked experiments
If you need to run experiments simultaneously they will have to be in separate control directories. It still possible to have all the experiments in a single git
repo, but the repo would need to be cloned to separate directories to run the experiments simultaneously.
Using the example from above, you can set up independent experiments in a number of ways. A couple of examples are shown below. In both examples the goal is to keep the experiment name the same as it would be if the experiments were run as branches in the same repository.
Subdirectories
One option is to clone the same experiment into subdirectories, e.g.
$ payu clone -B release-1deg_jra55_ryf -b plus git@github.com:ACCESS-NRI/access-om2-configs.git plus/1deg_jra55_ryf
$ payu clone -B release-1deg_jra55_ryf -b minus git@github.com:ACCESS-NRI/access-om2-configs.git minus/1deg_jra55_ryf
It doesn’t matter how the subdirectories are named, only the name of the control directory itself matters, as this is determines the experiment name.
With versions of payu < 1.1
it is not be possible to have experiments with the same control directory, but the latest version of payu
also adds a branch name and Experiment UUID, so the control directory can be identical.
Experiment name
Another option is to clone the experiment into uniquely named directories, but set the experiment
name in config.yaml
. e.g.
$ payu clone -B release-1deg_jra55_ryf -b plus git@github.com:ACCESS-NRI/access-om2-configs.git 1deg_jra55_ryf_plus
$ payu clone -B release-1deg_jra55_ryf -b minus git@github.com:ACCESS-NRI/access-om2-configs.git 1deg_jra55_ryf_minus
then add the following to config.yaml
in both experiments:
experiment: 1deg_jra55_ryf
This overrides the default experiment name root, which is the control directory name and would result in the same naming scheme as the subdirectories above.
Why?
Why use a single repo with branches in this case if we’re making separate control directories? There is still value in having a single repository with related experiments in named branches. At any point the branches in the separate control directories can be pushed back to a single repository. This reduces the number of repositories, which can become a serious issue with many experiments. Having related experiments in the same repository also allows for easy comparisons between the configurations, and also makes it easier to apply common changes to a number of related experiments.
D. Existing experiments
Legacy mode (no changes to layout)
The new branching features of payu
are completely backwards compatible. If an existing experiment has experiment output in archive
it will make no change to the experiment name, or the archive
link. The first time any payu
command is run it will generate an experiment UUID and add it to metadata.yaml
, and create metadata.yaml
if necessary.
payu
will tell you what it is doing. This is part of the output when payu
was run in an existing experiment:
/g/data/vk83/apps/payu/1.1/lib/python3.9/site-packages/payu/metadata.py:130: MetadataWarning: No experiment uuid found in metadata. Generating a new uuid
warnings.warn("No experiment uuid found in metadata. "
Pre-existing archive found at: /scratch/tm70/xx9999/access-om2/archive/1deg_jra55_ryf. Experiment name will remain: 1deg_jra55_ryf
Updated metadata. Experiment UUID: 05e1648d-9997-4eec-b2e1-ebb3881c95b4
It says that a pre-existing archive was found, and that the experiment name is unchanged, but an experiment UUID has been generated and the metadata updated:
experiment_uuid: 05e1648d-9997-4eec-b2e1-ebb3881c95b4
No branching
For an experiment that does not have existing experiment output, but also doesn’t use special branch names, so the branch is either main
or master
, the experiment name does not include the bench name. It will only be the control directory name and the shortened experiment UUID. e.g. cloning the previous version of an experiment and running payu
Show details of example commands
$ gh repo clone COSIMA/1deg_jra55_ryf 1deg_jra55_ryf_existing
Cloning into '1deg_jra55_ryf_existing'...
remote: Enumerating objects: 1132, done.
remote: Counting objects: 100% (360/360), done.
remote: Compressing objects: 100% (64/64), done.
remote: Total 1132 (delta 312), reused 319 (delta 296), pack-reused 772
Receiving objects: 100% (1132/1132), 366.44 KiB | 887.00 KiB/s, done.
Resolving deltas: 100% (694/694), done.
$ cd 1deg_jra55_ryf_existing
$ payu sweep
laboratory path: /scratch/tm70/xx9999/access-om2
binary path: /scratch/tm70/xx9999/access-om2/bin
input path: /scratch/tm70/xx9999/access-om2/input
work path: /scratch/tm70/xx9999/access-om2/work
archive path: /scratch/tm70/xx9999/access-om2/archive
/g/data/vk83/apps/payu/1.1/lib/python3.9/site-packages/payu/metadata.py:130: MetadataWarning: No experiment uuid found in metadata. Generating a new uuid
warnings.warn("No experiment uuid found in metadata. "
/g/data/vk83/apps/payu/1.1/lib/python3.9/site-packages/payu/metadata.py:192: MetadataWarning: No pre-existing archive found. Generating a new uuid
warnings.warn(
Updated metadata. Experiment UUID: 40a219d7-6011-4004-9aeb-d3f93a0fb81e
results in the following changes/additions to metadata.yaml
:
experiment_uuid: 40a219d7-6011-4004-9aeb-d3f93a0fb81e
name: 1deg_jra55_ryf_existing-40a219d7
The rationale for this approach is to have as few surprises as possible from the way payu
has worked in the past when running an existing experiment, or what appears to be in “legacy mode”.