Introduction
This material is designed for a 3 hour (2x1.5 hr sessions) payu
tutorial as part of the ACCESS Workshop 2024.
Goals
- Learn how to obtain and run an experiment with an ACCESS supported model configuration using the
payu
scientific workflow management tool. - Become familiar with creating new experiments by altering existing configurations.
- Learn how to curate a multi-experiment
git
repository - Develop skills sharing configurations (and experiments) with GitHub
- Understand experiment provenance, and how payu and git enable extensive experimental provenance
Requirements
- Account on
gadi
and membership ofnf33
andvk83
projects - To run ACCESS-ESM1.5: membership of
ki32
andki32_mosrs
- GitHub account
- Some familiarity with git
- Your own computer
- Some familiarity with linux command line
Terminology
Climate modelling is complicated. We need a shared vocabulary to enable a shared understanding. In this training these are some of the terms we’ll use and what they mean:
Term | Meaning |
---|---|
model | A combination of model components, compiled and deployed by ACCESS-NRI |
model component | One discrete component of a multi-component system, e.g. MOM5 ocean |
model configuration | A git repository containing a model configuration |
experiment | A specific realisation (series of runs) of a model configuration |
For example, the ACCESS-ESM1.5 model is versioned, built and deployed from the ACCESS-ESM1.5 repository. It has four major model components (see the Hive Docs for details), and two supported model configurations.
Workflow management
Payu is a workflow management tool for running numerical models in supercomputing environments, and it is the tool used to run the ACCESS models covered in this tutorial.
Payu is written in python. See the documentation or the GitHub repository for more information.
The latest version is 1.1.5, and that is the minimum required for this tutorial. ACCESS-NRI provides supported conda
environments for payu
, which also contains other dependencies and tools required to run ACCESS-NRI supported models. These can be accessed via the module system:
module use /g/data/vk83/modules
module load payu/1.1.5
and it is a requirement that this environment is loaded for all the subsequent steps in this tutorial.
Details of the latest version of payu are available in the release notes which are updated when new supported versions of payu are released.
Experiment provenance
Detailed provenance allows researchers to better understand, reproduce, and validate their simulation experiments.
Experiment provenance for computer simulations involves documenting the entire lifecycle of an experiment to ensure transparency, reproducibility, and validation.
Some key components:
- Model Information: Details about the model and model components used, source code versions, parameters, and configuration.
- Input Data: Information on the data fed into the simulation, such as initial conditions, forcing data, and preprocessing steps.
- Execution Environment: Specifications of the hardware and software environment, including operating systems, libraries, and dependencies.
- Model Runs: Records of each run, including timestamps, input parameters, and any variations between runs.
- Output Data: Documentation of the results generated by the simulation, including formats, storage locations, and any post-processing steps.
- Authorship and Contributions: Information about the individuals who conducted the simulation, their roles, and contributions
git and payu
Git is a distributed version control system that is used “under the hood” by payu to track experiment provenance.
Git was designed to enable fast and efficient version control of computer source code, which is typically a directory tree containing text files.
A payu model configurations is a number of configuration files that control how a model and its components are run, i.e. a directory tree containing text files. So a good fit for git.
Some key features of git that are particularly relevant for use with payu and experiment provenance:
- Commit History: Track changes with detailed commit messages, timestamps, and author information.
- Security: cryptographic methods used to ensure integrity of the version history.
- Branching and Merging: Easily create, manage, and merge branches for parallel development.
- Distributed Version Control: Every user has a complete copy of the repository, including its full history.
- Collaboration: Supports multiple users working on the same project simultaneously.
Payu utilises manifests (formatted text files) to uniquely identify all executables, model inputs and restart files. Payu adds these manifests to the files that are tracked by git.
The combination of payu and git to track and version experimental configurations satisfies many provenance requirements: authorship, model runs, inputs and model information.
Payu uses git branching to support multiple experiments in the same repository.
Support for distributed version control and collaboration allows researchers to easily and effectively share their work. This is good for researchers and good for science.
Running a (long) experiment
The first step of the tutorial is to start a standard run of model configuration so that it has time to finish.
ssh
into gadi
. Make sure you’ve loaded payu/1.1.5
(see above)
Create a directory for all the training material in your home directory and change directory into it
mkdir ~/payu-training/
cd ~/payu-training/
Clone experiment
Exercise: Clone a released configuration to a new experiment directory.
Choose either 1 deg ACCESS-OM2 RYF or pre-industrial ACCESS-ESM1.5 and clone to a new experiment directory and branch called control
(see the ACCESS-Hive Docs for OM2 or the same for ESM1.5 for instructions on how to do this).
Answer
ACCESS-OM2:
payu clone -b control -B release-1deg_jra55_ryf https://github.com/ACCESS-NRI/access-om2-configs 1deg_jra55_ryf-training
ACCESS-ESM1.5:
payu clone -b control -B release-preindustrial+concentrations https://github.com/ACCESS-NRI/access-esm1.5-configs preindustrial+concentrations-training
Run experiment
Exercise: Change project used to run the model to nf33 and do a single run
Changing the project code used to run a model is covered in the Hive Docs for ESM1.5 and OM2 (though it is essentially identical).
Running a model is also covered in Hive Docs:
Tip
It is always a good idea to runpayu setup
after cloning a new configuration or making substantial changes. This runs just thesetup
phase, which tests access to all the paths required to set up a model run, and updates manifests whichpayu
uses to determine which storage mounts need to be included in a PBS submission. However you either need to runpayu sweep
before running, or use the-f
option when running.
Answer
payu setup
payu sweep
payu run
This will take over an hour, so we move on and re-visit later in the tutorial.
Run another (short) experiment
While the long experiment is running, we use this time to clone a new experiment and change model run length to shorter run times. We will also look into configuring syncing and restart pruning with payu.
Exercise: Clone a new experiment
Firstly create a new clone of the same experiment as above with a different directory and branch name. This must be a separate clone into a different directory because we’ll be running multiple experiments simultaneously, and only one experiment can be run at a time in a given control directory.
- Change to training directory
~/payu-training
- Create a new clone of the experiment with a different directory and branch name.
- Change to new experiment
control
directory.
Solution
cd ~/payu-training
- This could look something like the following:
ACCESS-OM2
payu clone -b sync-and-restart-pruning-expt -B release-1deg_jra55_ryf https://github.com/ACCESS-NRI/access-om2-configs 1deg_jra55_ryf-training-2
ACCESS-ESM1.5
payu clone -b sync-and-restart-pruning-expt -B release-preindustrial+concentrations https://github.com/ACCESS-NRI/access-esm1.5-configs preindustrial+concentrations-training-2
Where -b
is the new branch name, and the name at the end of the command is the new directory name.
- Using the above solution examples:
cd 1deg_jra55_ryf-training-2
or
cd preindustrial+concentrations-training-2
Editing config.yaml
file (optional)
In this section, we will modify the config.yaml
file in the control
directory. This payu configuration file controls the general model configuration. Editing this file can be done via your favourite editor, for example, vim
or vscode
. If you are new to editing files using the command-line, an option could be to use Nano, as it keeps a menu of possible command options at the bottom of the editor. This is an optional section, feel free to skip if you are comfortable editing files from the terminal on gadi
.
Exercise (Optional): Using Nano to edit `config.yaml` files
- To open
config.yaml
in Nano, run the following command:
nano config.yaml
- Navigate through the file using arrow keys, and start typing to insert text.
- To close the editor, press
Ctrl + X
. When there are changes, Nano will prompt you if you want to save or discard changes. To save the changes, pressY
andEnter
to confirm and save.
Exercise: Configure run length
Changing the run length requires opening configuration files in a text editor, making changes and saving those changes.
ACCESS-OM2
Using the Hive Docs guide on how to change run length for ACCESS-OM2, change the run length to 1 month.
Solution
The run length is controlled by the restart_period
field in the &date_manager_nml
section of the accessom2.nml
file:
&date_manager_nml
forcing_start_date = '1958-01-01T00:00:00'
forcing_end_date = '2019-01-01T00:00:00'<br>
! Runtime for a single segment/job/submit, format is years, months, seconds,
! two of which must be zero.
restart_period = 5, 0, 0
&end
- Open
accessom2.nml
in a text editor. If using Nano, this will be:
nano accessom2.nml
- Change the run length to 1 month:
restart_period = 0, 1, 0
- Close and save the file.
ACCESS-ESM1.5
Using the Hive Docs guide on how to change run length for ACCESS-ESM1.5 and running less than a year, change the run length to 1 month.
Solution
The length of an ACCESS-ESM1.5 run is controlled by the runtime
settings in the config.yaml
file:
runtime:
years: 1
months: 0
days: 0
Run length for ACCESS-ESM1.5 experiments usually should be left at 1 year to avoid errors. However, for this exercise, we use a shorter run length so the model does not take as long to run. This requires an additional change to the sea ice model configuration so that restart files are produced at monthly frequencies.
- Open
config.yaml
in a text editor. If usingNano
,
nano config.yaml
- Change the run length to 1 month:
runtime:
years: 0
months: 1
days: 0
- Close and save the file.
- Open
ice/cice_in.nml
configuration file. - Change
dumpfreq = 'y'
setting todumpfreq = 'm'
- Close and save the file.
Exercise: Commit changes
It is always a good idea when making changes to a model configuration to git commit
them with an informative message about why the change has been made.
Make a git commit with the message “Reduced run time to 1 month for testing”
Answer
git commit -a -m 'Reduced run time to 1 month for testing'
Configuring Restart Pruning
There are restart files for every run to allow subsequent runs to start from a previously saved model state. These restart files can occupy a significant amount of disk space. By default, payu keeps the restart files for every fifth run and “prunes” (deletes) the rest. For example, say if a model had run 11 times, the restarts in the archive directory would be:
restart000
restart005
restart010
Click for more detail
Intermediate restarts
Intermediate restarts are retained and are only deleted after a permanently archived restart files have been produced.
So when the model has been run 15 times, the restarts in the archive directory would be:
restart000
restart005
restart010
restart011
restart012
restart013
restart014
After the 16th model run, these intermediate restarts are deleted as it has reached a permanently archived checkpoint - restart015
.
restart000
restart005
restart010
restart015
restart_freq
The rate at which restart files are pruned is controlled by restart_freq
in config.yaml
This can either be an integer or date-based frequency. For example to save all restart files, the setting in config.yaml
would be:
restart_freq: 1
Using date-based restart frequency is useful because it makes restart pruning independent of model run length. If the model run length is modified during the course of an experiment the frequency with which restarts are pruned is unaffected.
This is covered in the Hive Docs for ESM and OM2.
Exercise: Set restart pruning frequency
We will set a frequency small enough that we can see changes to archive over a short run
Edit config.yaml
to change the restart_freq
to keep the first restart date-time every 2 months.
Solution
The config.yaml
should contain:
restart_freq: 2MS
Hint: To always keep the last
N
restarts, in addition to the permanently saved restarts determined byrestart_freq
, you can setrestart_history
inconfig.yaml
. Sorestart_history: 5
keeps the last 5 restarts.
Configuring sync
By default, payu laboratory
and archive
directories are in /scratch
storage. The scratch filesystem is temporary storage where files not accessed for 100 days will be automatically removed. For this reason, it is often required for model outputs to be moved to /g/data/
for long-term data storage.
Payu has some syncing support using rsync
commands under the hood which runs in a separate PBS job. If automatic syncing is enabled, this job is submitted after the collation
PBS job, if collation
is enabled, or after the payu archive step in the run PBS job. In the case of both ACCESS-OM2
and ACCESS-ESM1.5
configurations, it is submitted after the collation has been completed.
Sync options
As there are several configurations options for syncing, it has its own subsection in config.yaml
under sync
. The main options are:
enable
(Default:False
) - Controls whether or not async
job is submitted automaticallypath
- Destination path to sync archive outputs to. NOTE: This must be a unique absolute path for your experiment, otherwise, outputs will be overwritten.restarts
(Default:False
) - Sync permanently archived restarts determined byrestart_freq
.
Sometimes it’s useful to remove files from the archive
after they have been successfully synced to save space. There are two levels of options:
remove_local_files
(Default: False) - This deletes files after they have synced but will leave behind empty directories and files that were excluded from the sync commands.remove_local_dirs
(Default:False) - This removesoutput
andrestart
directories.
Both of the above will not delete files and directories from the last output. If restarts have been synced, the last saved restart (determined by `restart_freq’) and any subsequent restarts will also not be deleted.
Because sync
is run as a separate PBS job, it has several configurable PBS settings. For example, queue
controls what PBS queue it runs on which is by default copyq
. If there’s additional post-processing to be run before syncing to a remote archive, there is a sync
user-script option. This can run a script or command at the start of sync PBS job before any rsync
commands are run.
A full list of sync options can be found under Post-proccessing in payu documentation
Exercise: Set sync parameters
Enable sync
in config.yaml
and configure remote archive directory
/scratch/nf33/<replace-with-user-id>/tmp/test-sync-experiment-archive
where <replace-with-user-id>
is your NCI user-name. Restarts should also be synced.
Solution
The sync
subsection in config.yaml
should look similar to the following:
sync:
enable: true
restarts: true
path: /scratch/nf33/<replace-with-user-id>/tmp/test-sync-experiment-archive
Run experiment
To obtain several output and restart directories, we will need to run the model a number of times.
Exercise: Run experiment 6 times
See the ESM and OM2 Hive docs for information on how to run the models multiple times.
Answer
payu setup
payu sweep
payu run -n 6
payu setup
checks the restart_freq
is set to a valid value
payu sweep
removes the work
directory generated by payu setup
The -n
flag sets the number of runs to be performed.
Note: The above will run 6 model execution jobs in 6 separate PBS job submissions for both ACCESS-OM2 and ACCESS-ESM1.5 configurations. The number of runs per submission can be modified by setting
runspersub
inconfig.yaml
, which defines the maximum number of runs for each payu submission.
PBS error and output log files
The control directory contains all the PBS logs for each job. After 6 runs for the ACCESS-ESM1.5 pre-industrial
configuration , the control directory looks like:
$ ls # ls ~/payu-training/preindustrial+concentrations-training-2
archive pre-industria_c.e123895132 pre-industria_c.o123897744 pre-industrial.o123894927 pre-industria_s.e123897850 README.md UM_conversion_job.sh.o123895276
atmosphere pre-industria_c.e123895450 pre-industria_c.o123898409 pre-industrial.o123895133 pre-industria_s.e123898819 scripts UM_conversion_job.sh.o123896556
config.yaml pre-industria_c.e123896742 pre-industria_c.o123899069 pre-industrial.o123895452 pre-industria_s.e123899300 testing UM_conversion_job.sh.o123896984
coupler pre-industria_c.e123897744 pre-industrial.e123894927 pre-industrial.o123896743 pre-industria_s.o123895313 UM_conversion_job.sh.e123895276 UM_conversion_job.sh.o123897849
ice pre-industria_c.e123898409 pre-industrial.e123895133 pre-industrial.o123897745 pre-industria_s.o123896557 UM_conversion_job.sh.e123896556 UM_conversion_job.sh.o123898817
LICENSE pre-industria_c.e123899069 pre-industrial.e123895452 pre-industrial.o123898410 pre-industria_s.o123896987 UM_conversion_job.sh.e123896984 UM_conversion_job.sh.o123899298
manifests pre-industria_c.o123895132 pre-industrial.e123896743 pre-industria_s.e123895313 pre-industria_s.o123897850 UM_conversion_job.sh.e123897849
metadata.yaml pre-industria_c.o123895450 pre-industrial.e123897745 pre-industria_s.e123896557 pre-industria_s.o123898819 UM_conversion_job.sh.e123898817
ocean pre-industria_c.o123896742 pre-industrial.e123898410 pre-industria_s.e123896987 pre-industria_s.o123899300 UM_conversion_job.sh.e123899298
Each PBS job has standout output and error logs which are written out to <jobname>.o<job-ID>
and <jobname>.e<job-ID>
respectively.
Using the above example there are 4 types of jobs run:
pre-industrial
: The main model execution job whereinit
,setup
,run
, andarchive
stages run. At the end of this job, it submits thecollate
job.pre-industria_c
: Runs the collation. Once thecollate
stage has run, it submits thepostscript
andsync
jobs. Note:_c
markspayu collate
job logs.pre-industria_s
: Runs the syncing to remote archive. Note:_s
markspayu sync
job logs.UM_conversion_job.sh
: This is thepostscript
job - a user-defined post-processing script
Exercise: Find the Service Units and Walltime for a single run
Monitor your run: see Hive Docs for ESM and OM2 on how to do this.
When your first job completes examine the PBS output log file and find the Services Units and Walltime used.
Hint: the commands
cat
andless
are useful ways to view a text file
Answer
For example (your PBS extension will be different)
cat pre-industrial.o123894927
======================================================================================
Resource Usage on 2024-08-29 10:54:22:
Job Id: 123894927.gadi-pbs
Project: tm70
Exit Status: 0
Service Units: 100.05
NCPUs Requested: 384 NCPUs Used: 384
CPU Time Used: 44:51:57
Memory Requested: 1.5TB Memory Used: 152.35GB
Walltime requested: 02:30:00 Walltime Used: 00:07:49
JobFS requested: 800.0MB JobFS used: 8.16MB
======================================================================================
So service units = 100 and Walltime used is 7m49s.
Local Archive
After the 6 sequential runs, we expect the archive to look like the following:
$ ls archive/ # List directories under the archive symlink in the control directory
metadata.yaml output000 output001 output002 output003 output004 output005 pbs_logs restart000 restart002 restart004 restart005
There should be 6 output directories and restarts are pruned at 2-month intervals. Note that intermediate restart restart005
will be kept until there is a restart that has a date-time later than the beginning of the 2nd month from restart004
.
Remote Archive
After 6 sequential runs with syncing enabled, the remote archive should contain the following:
$ ls /scratch/tm70/<replace-with-user-id>/tmp/test-sync-experiment-archive/
git-runlog metadata.yaml output000 output001 output002 output003 output004 pbs_logs
Note: If using
ACCESS-OM2
configuration, the latest output,output005
, will also be synced.
Exercise: Confirm local and remote archive contain correct files
Confirm your model run has completed. List the files in your local and remote archive and check they contain the correct files.
Postscript and Sync
The ACCESS-ESM1.5
experiment runs have an additional PBS post-processing job. The PBS logs for these jobs start with UM_conversion_job.sh
. This job converts atmospheric outputs to NetCDF format. It is set in config.yaml
under
postscript: -v PAYU_CURRENT_OUTPUT_DIR,PROJECT -lstorage=${PBS_NCI_STORAGE} ./scripts/NetCDF-conversion/UM_conversion_job.sh
The postscript
job is submitted at the same time as the sync
job. Currently, when postscript
is configured, the last outputs and restarts are not automatically synced because there is no guarantee the postscript job will be finished before the sync
job starts. It will however sync all outputsN
where N
< current run counter, so this means it will sync all previous outputs.
A future improvement to Payu
syncing support, could be to add dependency logic to syncing so it waits for the end of the postscript
job before running the sync
job so the latest output can be synced automatically. So, keep an eye out for future payu
releases and updates!
In the meantime, payu sync
can be run manually at the end of an experiment to sync the final outputs and restart files to a remote archive.
Exercise: Manually run payu sync
With sync
configured, you can manually submit sync
jobs using the payu sync
command. Using payu sync
will sync all output directories.
In this exercise, we will modify the sync
sub-section in config.yaml
but wait until all jobs from the previous exercise have been completed.
Set remove_local_dirs
to true
to enable local archive deletion of synced output and restart directories, then sweep to copy logfiles to archive
and sync
Answer
The sync
section in config.yaml
should now look something like:
sync:
enable: true
path: /scratch/tm70/<your-user-id>/tmp/test-sync-experiment-archive
restarts: true
remove_local_dirs: true
Then run:
payu sweep
payu sync
Once the sync
job has been completed, check the remote archive. It should now contain all the outputs and all the permanently saved restart directories:
$ ls /scratch/tm70/<your-user-id>/tmp/test-sync-experiment-archive/
git-runlog metadata.yaml output000 output001 output002 output003 output004 output005 pbs_logs restart000 restart002 restart004
Note:
restart005
is not synced as it is an intermediate restart directory
Check the local archive directory. Every output that isn’t the latest output should be deleted, and every restart that isn’t the last permanently saved restart and intermediate restart should also be deleted.
$ ls archive/
metadata.yaml output005 restart004 restart005
Exercise: Sync the entire archive
To sync all restarts, you can add --sync-restarts
flag to payu sync
. This is particularly useful when an experiment is finished, or temporarily halted for some time, to make sure all outputs and restarts have been copied to non-ephemeral storage.
- Wait until all jobs from the previous exercise have completed.
- Run
payu sweep
to move all log files to archive directory - Run
payu sync --sync-restarts
command - When the job has completed (when log files for
sync
job have been created), check the remote archive. We expect it to contain all outputs and restarts:
$ ls /scratch/tm70/<replace-with-user-id>/tmp/test-sync-experiment-archive/
git-runlog metadata.yaml output000 output001 output002 output003 output004 output005 pbs_logs restart000 restart002 restart004 restart005
Collaboration with GitHub
Why collaborate?
Just some examples of what collaboration can do:
- Save time and resources by avoiding wasteful duplication or known pitfalls or errors
- Assist inexperienced researchers to become productive faster
- Bring new skills and perspectives from other disciplines
What is GitHub?
From wikipedia:
GitHub is a developer platform that allows developers to create, store, manage and share their code. It uses Git software, providing the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous integration, and wikis for every project.
Why GitHub?
- World’s largest source code host (>100 million developers, >420 million repositories)
- Free for open source projects
- Built in support for automated CI/CD (GitHub Actions)
- Easy cloning: just fork a repo
- Visibility and documentation of issues
- Visibility of fixes and support for code reviews via pull-requests
Introducing gh
This tutorial uses the GitHub command line interface (CLI) client gh
to interact with GitHub
GitHub is web site, and it is possible to use the web interface to do things like create a repository, but it is simpler for this purpose to provide commands that can be used directly on gadi
.
Authorise with GitHub
gh
is included in payu
modules supported by ACCESS-NRI. As long as the payu
command is available gh
should be also.
The first step is to authorise with GitHub:
gh auth login
This will prompt for a series of responses. Select the responses used below:
? What account do you want to log into? GitHub.com
? What is your preferred protocol for Git operations on this host? HTTPS
? Authenticate Git with your GitHub credentials? Yes
? How would you like to authenticate GitHub CLI? Login with a web browser
! First copy your one-time code: XXXX-XXXX
Press Enter to open github.com in your browser...
At this point you will get an error opening a browser on gadi
:
! Failed opening a web browser at https://github.com/login/device
exec: "xdg-open,x-www-browser,www-browser,wslview": executable file not found in $PATH
Please try entering the URL in your browser manually
So open Open https://github.com/login/device in your browser, authenticate with GitHub if you’re not already logged in, copy the one-time code from your terminal window and paste it in. Then authentication should complete:
✓ Authentication complete.
- gh config set -h github.com git_protocol https
✓ Configured git protocol
! Authentication credentials saved in plain text
✓ Logged in as xxxxxxxxxxx
To check status use gh auth status
$ gh auth status
github.com
✓ Logged in to github.com account xxxxxxxxx (/home/XXX/xxxXXX/.config/gh/hosts.yml)
- Active account: true
- Git operations protocol: https
- Token: gho_************************************
- Token scopes: 'gist', 'read:org', 'repo'
Perturbation
In this section we will clone from a pre-existing control experiment, and create related perturbation experiments from the same control directory.
Branches
There is now support in payu to run multiple related experiments from the same control directory, though only one experiment can be running at any one time. To distinguish between branches in work
and archive
directories, payu combines the directory name, the branch name and the first 8 digits of the experiment UUID. We will refer to this as the experiment name.
To change between branches in a control directory, use payu checkout
. This is a wrapper around git checkout
, and it sets up the archive
and work
directory symlinks. To checkout and create a new branch, use -b
command-line flag.
By default with payu checkout -b
, it will use the current branch as a base. It start from an earlier commit or branch, add this to the end of the command. For example,
payu checkout -b <new-branch-name> <base-commit-or-branch-name>
Similarly, to payu clone
, use --restart
/-r
to specify restart path to start the model run from. This adds the restart
option to config.yaml
which is used as the starting point of a run. This option has no effect if there are existing restart directories, so does not have to be removed for subsequent runs.
For more information, run payu checkout --help
or see payu documentation on Metadata and Related Experiments.
Exercise: Clone pre-existing control experiment
There are control experiments available for ACCESS-ESM1.5
and ACCESS-OM2
For this exercise clone one of these repositories into the ~/payu-training
directory.
Answer
cd ~/payu-training
ACCESS-ESM1.5
gh repo clone git@github.com:ACCESS-Community-Hub/access-esm1.5-preindustrial-concentrations-example
cd access-esm1.5-preindustrial-concentrations-example
ACCESS-OM2
gh repo clone ACCESS-Community-Hub/access-om2-1deg_jra55_ryf-example
cd access-om2-1deg_jra55_ryf-example
Exercise: Create first perturbation experiment
To run a perturbation experiment from an existing experiment restart files from the existing experiment are required.
Restarts for the ACCESS-OM2 and ACCESS-ESM1.5 experiments have been copied to
/g/data/nf33/public/training-day-2024/payu-training/experiments/
$ ls -gh /g/data/nf33/public/training-day-2024/payu-training/experiments/
total 8.0K
drwxr-s---+ 9 nf33 4.0K Sep 2 00:48 1deg_jra55_ryf-control-d0683f7e
drwxr-s---+ 8 nf33 4.0K Sep 2 00:59 20240827-release-preindustrial+concentrations-run-0225dcf2
The available restarts dictate where the experiment can be branched from. Examine what restarts are available, choose the second-to-last restart so the experiment is as equilibrated as possible, but there is still available control outputs to compare against our perturbation. Determine the commit hash corresponding to the end of that run.
Checkout a new experiment, perturb1
using the restart path and commit hash determined above. Modify a model parameter and change run length to 1 month. git commit
changes with informative commit message
Answer
ACCESS-ESM1.5
Examine the directory for available restarts:
$ ls -gh /g/data/nf33/public/training-day-2024/payu-training/experiments/20240827-release-preindustrial+concentrations-run-0225dcf2/
total 36K
drwxr-s---+ 2 nf33 4.0K Aug 30 21:18 error_logs
-rw-r--r--+ 1 nf33 2.1K Sep 2 01:00 metadata.yaml
drwx--S---+ 2 nf33 12K Sep 1 22:33 pbs_logs
drwx--S---+ 6 nf33 4.0K Aug 27 11:18 restart000
drwx--S---+ 6 nf33 4.0K Aug 28 00:42 restart010
drwx--S---+ 6 nf33 4.0K Aug 30 11:14 restart020
drwx--S---+ 6 nf33 4.0K Sep 2 00:59 restart030
restart020
is the second to last.
Examine the git log
, either in the local repo, or on GitHub
payu checkout -r /g/data/nf33/public/training-day-2024/payu-training/experiments/20240827-release-preindustrial+concentrations-run-0225dcf2/restart020 -b perturb1 0f2e2bb
ACCESS-OM2
Examine the directory for available restarts:
$ ls -lg /g/data/nf33/public/training-day-2024/payu-training/experiments/1deg_jra55_ryf-control-d0683f7e/
total 32
drwxr-s---+ 7 nf33 4096 Aug 31 11:09 git-runlog
-rw-r-----+ 1 nf33 2254 Sep 1 22:36 metadata.yaml
drwx--S---+ 2 nf33 4096 Sep 2 00:45 pbs_logs
drwx--S---+ 5 nf33 4096 Aug 30 13:34 restart000
drwx--S---+ 5 nf33 4096 Aug 30 18:18 restart004
drwx--S---+ 5 nf33 4096 Aug 30 23:01 restart008
drwx--S---+ 5 nf33 4096 Aug 31 03:46 restart012
drwx--S---+ 5 nf33 4096 Aug 31 08:30 restart016
restart012
is the second to last.
Examine the git log
, either in the local repo, or on GitHub
payu checkout -r /g/data/nf33/public/training-day-2024/payu-training/experiments/1deg_jra55_ryf-control-d0683f7e/restart012 -b perturb1 4242995
Edit config.yaml
as before to change model run length.
git commit -a -m 'Modified xx parameter and set run length to one month'
So this branch is now all set up to run a perturbation experiment.
List what branch currently on
payu branch
Display archive symlink and experiment name
ls -l archive
Display git history
git log
Should see a new commit for new experiment UUID as used payu checkout
, and should see previous commit was the last commit of the previous run.
To see new metadata.yaml fields, includes experiment name and UUID:
cat metadata.yaml
Run
Now run perturbation experiment for one month
payu setup
payu run -f
Hint: a separate
payu sweep
isn’t necessary if the-f
option used forpayu run
. This automatically removes (sweep
s) an existingwork
directory
Exercise: Create second perturbation experiment
Once the first perturbation experiment is completed create a second perturbation, ideally related to the first in a meaningful way, e.g. opposite sign of change, or a parameter that is orthogonal but physically related.
Repeat steps above:
- Checkout new experiment,
perturb2
. Make sure to checkout from the same base commit asperturb1
and same restarts - Modify model parameter
git commit
- Run
Examining branches
Once the second perturbation experiment has completed you should have two experiment branches.
As long as an experiment isn’t running, and any associated post-processing or syncing has completed, it is safe to switch between experiments.
Exercise: list available experiments (branches)
answer
payu branch
Exercise: checkout perturb1
Checkout the first perturbation experiment. Note that the link to the archive
directory also changes. payu
does this automatically, and is one of the reasons why it is generally better to use payu checkout
to switch between branches rather than using git
directly.
answer
payu checkout perturb1
Push to repo
Now you can create a repository from your perturbation experiment control directory using
gh repo create
Follow the prompts and enter the information requested. Repository name will default to the directory name of your control directory. The repository owner should be your GitHub username used to authenticate. Choose public visibility. The remote name is just an alias to your repository that git uses when doing a push
or pull
.
? What would you like to do? Push an existing local repository to GitHub
? Path to local repository .
? Repository name XXXXX
? Repository owner xxxxxxxxx
? Description A nice description of the purpose of the repository
? Visibility Public
✓ Created repository xxxxxxxxx/XXXXX on GitHub
https://github.com/xxxxxxxxx/XXXXX
? Add a remote? Yes
? What should the new remote be called? myrepo
✓ Added remote git@github.com:xxxxxxxxx/XXXXX.git
? Would you like to push commits from the current branch to "myrepo"? Yes
Push branches to GitHub
If you have other experiment branches you wish to push to the same repo then use:
git push myrepo --all
Compare with control (optional)
If you want to compare your perturbation experiment with the control outputs, they are available here:
/scratch/nf33/public/training-day-2024/payu-training/experiments/
Fork an experiment repo
Exercise: collaborate with a colleague
- Find a collaborator in the room
- Fork their perturbation experiment repo
- Clone fork to gadi (I recommend using
gh
) - List available branches
- Choose branch and checkout using
payu
at specific commit with restart path - Modify perturbation and run a single month
push
your branch back to your fork- Add each other’s fork as a git remote and checkout their experiment
Long experiment run finished
Once the long experiment run has finished running change into the control directory and examine the outputs in archive/output000
.
Note the different layout of the models
Ice
- Output is stored in
ice/OUTPUT
directory - Diagnostic files contain multiple variables
Ocean
- Outputs are (mostly) split into one diagnostic variable per file
Atmosphere
- Diagnostic outputs are post-processed from UM fields file format to netCDF and saved in
archive/output000/atmosphere/netCDF
- Original UM fields files are deleted by default. The ESM Hive Docs explain how to change this default behaviour for debugging or verification purposes.