21st March 2025 - How to run an ACCESS-OM2 or OM3 model

cbull · 20 February 2025 04:32

When?

11am Friday 21st March 2025, here. Please give feedback on this session here

What is this?

A tutorial on how to run the ACCESS-OM2 model for the first time (from rest, a restart or a perturbation!).

Writing credits: @cbull @helen @Aidan. Testing: @NoahDay.

Prerequisites:

Get account on NCI, also see this help
Join your own NCI project and then vk83 , xp65 , ol01 and qv56. Find the project here and join it (see Join tab). Also see this help

Further information

Logging in to NCI

You need to join a NCI project (join your own project – ask a supervisor if you are unsure, then join vk83 , xp65 , ol01)

ssh -X <your-NCI-username>@gadi.nci.org.au

Replace username with your username.
-X enables graphical forwarding (i.e. pictures)

File system	Description
/home	Backed up. 10 GiB fixed quota per user.
/scratch	Not backed up, temporary files, auto purge policy applied.
/g/data	Not backed up, long-term large data files.
/apps	Read only, centrally installed software applications and their module files.

Experiment manager: Payu

Payu is an experiment manager and model running tool. It is the tool used to run the ACCESS models covered in this tutorial.

Payu is written in python. See the documentation or the GitHub repository for more information.

The latest version is 1.1.6, and that is the minimum required for this tutorial. ACCESS-NRI provides supported conda environments for payu, which also contains other dependencies and tools required to run ACCESS-NRI supported models. These can be accessed via the module system:

module use /g/data/vk83/modules
module load payu

Running OM2

Okay, here’s where the magic happens! In the below we make an experiment directory (via mkdir and payu clone...), then run the 1deg_jra55_ryf configuration using payu run…

mkdir -p ~/access-om2
cd ~/access-om2
payu clone --new-branch expt --branch release-1deg_jra55_ryf https://github.com/ACCESS-NRI/access-om2-configs 1deg_jra55_ryf
#do a `git branch -vv`, and you'll note this has made a new local branch 'expt'
cd 1deg_jra55_ryf
payu run

What if I want to extend an existing experiment (e.g. perturbation from a control experiment)

Use clone command with --restart option
payu clone –-new-branch <my-branch> --branch <control-branch> --restart <folder-path> <URL> <my-folder>
where

<my-branch> :: name of your experiment branch

<control-branch> :: branch of control experiment in repo

<folder-path> :: path to the restart files on filesystem

<repo-URL> :: control experiment git repo (GitHub or local clone)

<my-folder> :: path of new experiment control directory

Optionally add --start-point <GIT_REF> to branch your experiment
from a specific git commit or branch that corresponds to the specified
restart folder. This means your experiment is starting with the exactly the same model configuration as it was when the restart was created.

Then change some experiment configuration as required, commit changes using git commit and then payu run.

Note, this process is very similar for ESM and CM, for example see ESM instructions here

For the purpose of 21st March 2025 training, some sample output can be found here:

/g/data/ol01/training_program_2025/access-om2

(This data is temporary and will be deleted after the training.)

Running OM3

ACCESS-OM3 is currently in pre-alpha release, no support is currently provided for this model and the model configuration and model code will change before the full release. Using ACCESS-OM3 is only recommended for experienced (or brave!) users and collaborators developing ACCESS models.

In the below we make an experiment directory (via mkdir and payu clone...), then run the dev-1deg_jra55do_ryf configuration using payu run…

mkdir -p ~/access-om3
cd ~/access-om3
payu clone -b expt -B dev-1deg_jra55do_ryf https://github.com/ACCESS-NRI/access-om3-configs 1deg_jra55_ryf
cd 1deg_jra55_ryf
payu run

For the purpose of 21st March 2025 training, some sample output can be found here:

/g/data/ol01/training_program_2025/access-om3

(This data is temporary and will be deleted after the training.)

Monitoring the run: PBS

Additional information here. And run man qstat.

Three particularly important commands:

qsub job.sh - Submit job defined in the submission script job.sh
qstat -swx - gives status of the job
qdel <jobid> - Delete the job with jobID

Here’s an example of how to use qstat.
$ qstat -swx

The command -swx is made up of:

-s : Summary format - shows queue level status rather than individual jobs

-w: Wide format - displays output in multiple columns

-x: Extended/Expanded format - includes additional details in the output

The screenshot below is in regards to job 12345678.gadi -pbs

User aaa777 submitted the job
To the normal-exec queue
They requested 48 cores and 190 GiB memory
It requested 2:00 hours and has been running for 0:35:21
The line at the bottom indicates when the job started, what Gadi node it is running on, 2697, and the space reserved on jobfs

Model log files

While the model is running, payu saves the model standard output and error streams in the access-om2.out and access-om2.err files inside the control directory, respectively.
You can examine the contents of these files to check on the status of a run as it progresses (or after a failed run has completed).

At the end of a successful run these log files are archived to the archive directory and will no longer be found in the control directory. If they remain in the control directory after the PBS job for a run has completed it means the run has failed.

If the models crashes then most of the time, the errors will be detailed in these files (located in your control directory):
access-om2.err
access-om2.out

but if you need more information then check these files:
work/ocean/log/*
work/ICE/log/*
work/atmosphere/log/*

Model Live Diagnostics

ACCESS-NRI developed the Model Live Diagnostics framework to check, monitor, visualise, and evaluate model behaviour and progress of ACCESS models currently running on Gadi.
For a complete documentation on how to use this framework, check the Model Diagnostics documentation.

ACCESS-OM2 outputs

When your experiment has finished, a new folder will be created to store the outputs and linked to your control directory under `archive’

ls archive

will show you these folders:

metadata.yaml  output000  pbs_logs  restart000

The folder with the outputs most useful for science applications is output000 which contains a large number of netcdf files in subfolders.
To see netcdf files from the ice and ocean models (respectively) we can do:

 ls archive/output000/ice/OUTPUT/
 ls archive/output000/ocean/

What if I want Payu to copy my output to another location?

By default, payu laboratory and archive directories are in /scratch storage. The scratch filesystem is temporary storage where files not accessed for 100 days will be automatically removed. For this reason, it is often required for model outputs to be moved to /g/data/ for long-term data storage.

Payu has some syncing support using rsync commands under the hood which runs in a separate PBS job.

The sync subsection in config.yaml should look similar to the following:

sync:
    enable: true
    restarts: true
    path: /scratch/nf33/<replace-with-user-id>/tmp/test-sync-experiment-archive

(More details here)

Edit ACCESS-OM2 configuration

When editing your configuration, it is good practice to set runlog: true in config.yaml as your changes will automatically be committed when you run your experiments:

Queue settings

These are set in config.yaml. The default settings are:

queue: normal
walltime: 3:00:00
jobname: 1deg_jra55_ryf
mem: 1000GB

These set which queue you will be in, how much time you need to run for, the name of the job (that will appear when running qstat) and the amount of memory your run will need. You want to ask for the least amount of resources needed to do your job. Asking too much will result in longer queue time and asking for too little will slow down your job, or crash it.

Run consecutive years/restarts

Once your model run has finished, you can continue from the place that it stopped using

payu sweep
payu run

These commands will clean away the old run, set it up for a new run and then resubmit your job. The outputs from the new run will then be stored in a new folder:

 ls archive/output001/ice/OUTPUT/
 ls archive/output001/ocean/

This method can be cumbersome if you are running long simulations so see below for instructions on how to run longer simulations and automate the restarts

Change run length

The steps needed to change the length of the model run is different in ACCESS-ESM than ACCESS-OM2. In this tutorial we we focus on changing the length of ACCESS-OM2
Instructions on how to change the run length in ACCESS-ESM can be found here

One of the most common changes is to adjust the duration of the model run.
For example, when debugging changes to a model, it is common to reduce the run length to minimise resource consumption and return faster feedback on changes.

The run length is controlled by the restart_period field in the &date_manager_nmlsection of the ~/access-om2/1deg_jra55_ryf/accessom2.nml file:

&date_manager_nml
    forcing_start_date = '1958-01-01T00:00:00'
    forcing_end_date = '2019-01-01T00:00:00'<br>
    ! Runtime for a single segment/job/submit, format is years, months, seconds,
    ! two of which must be zero.
    restart_period = 5, 0, 0
&end

The format is restart_period = <number_of_years>, <number_of_months>, <number_of_days>.

For example, to make the model run for 1 year, 4 months and 10 days, change restart_period to:

restart_period = 1, 4, 10

Troubleshooting: Error and output files

Trouble-shooting: Payu

If payu doesn’t run correctly for some reason, a good first step is to run the following command from within the control directory:

payu setup

outputs from this command will look like this:

laboratory path: /scratch/$project/$user/access-om2
binary path: /scratch/$project/$user/access-om2/bin
input path: /scratch/$project/$user/access-om2/input
work path: /scratch/$project/$user/access-om2/work
archive path: /scratch/$project/$user/access-om2/archive
/g/data/vk83/apps/base_conda/envs/payu-1.1.6/lib/python3.10/site-packages/payu/metadata.py:189: MetadataWarning: No pre-existing archive found. Generating a new uuid
Updated metadata. Experiment UUID: 2c91324a-c432-48ff-bbd8-71ed65163d7a
payu: Found modules in /opt/Modules/v4.3.0
Loading input manifest: manifests/input.yaml
Loading restart manifest: manifests/restart.yaml
Loading exe manifest: manifests/exe.yaml
Setting up atmosphere
Setting up ocean
Setting up ice
Setting up access-om2
Checking exe, input and restart manifests
Writing manifests/input.yaml
Writing manifests/restart.yaml

This command will:

create the laboratory and work directories based on the experiment configuration
generate manifests
report useful information to the user, such as the location of the laboratory where the work and archive directories are located

If you run payu setup, make sure you run payu sweep before starting your experiment using payu run.

cbull · 20 March 2025 23:43

Please give feedback on this session here

jasmeen_kaur · 21 March 2025 00:27

Session Notes

Presenters: @cbull @Helen

A bit introduction at the start of OM2 and OM3, and then would split into breakout rooms for each of the model configurations.
Making sure all of the prerequisites are met, as illustrated in the start of this post.
ACCESS-OM2 detailed instructions on ACCESS-Hive Docs - Run ACCESS-OM - ACCESS-Hive Docs.
Step 1: Logging into NCI and ssh into Gadi.
- The main file systems are /home (keep all important things like model configuration setup - 10Gb storage), /scratch (not backed up, fast file system, after 90 days all files are deleted, not store things for long term use), g/data (not backed up, but appropriate for long-term storage, folder structures, subjective on where on g/data/ you should be putting your data on), others as well.
Step 2: Payu - use at ACCESS-NRI to run climate models. Refer to docs to know more.
- Make sure payu version is 1.16.
Step 3: To run a climate model, need two things - model code, and model configuration (instructions slightly differ for OM2 and OM3)
- Go to home directory (default), load payu, make a new folder called access-om2. This is where the control configuration lives.
- Go to GitHub, and copy the url of access-om2-configs repository. In this way, the configuration repository is cloned into this folder.
- What is payu clone? It clones the GitHub repository, and specify which branch of the config file we want to use (using --branch flag).
- Once cloned, it will also create a new branch, and we will put it in access-om2 folder.
- Once done with a command, it is creating a git repository into that folder.
Step 4: Jump into separate breakout rooms as output is different and making modifications are different for both OM2 and OM3

Important Links

Training materials - This post.
ACCESS-Hive Docs - https://access-hive.org.au
ACCESS-Hive Forum - https://forum.access-hive.org.au

cbull · 21 March 2025 03:25

Session feedback:

Anonymous
The docs were very clear and the demonstrator was very helpful with answering all my questions about adjacent topics to just running the model

Anonymous
Very well run and informative. Thanks for putting on these training sessions!

cbull · 24 March 2025 06:45

Anonymous
Thanks for hosting this session. There was a nice intro to OM3, which was interesting. It’s a pity one had to choose between OM2 and OM3, though, as it would have been good to learn more about OM2 and modifying experimental setups.

Topic		Replies	Views
13th Feb 2025 - Experiment manager and Payu 2025 training program	1	114	12 February 2025
Running Model Experiments with payu and git Training Day payu , training , workshop-2024	2	293	2 September 2024
ACCESS-OM2 payu tutorial COSIMA access-om2 , tutorial , payu	0	703	24 January 2024
ACCESS Workshop Training Day - what should we offer? Training workshop	0	144	12 June 2024
Payu: a workflow manager for some ACCESS models ACCESS-NRI Releases nci , release , payu , nri-updates	6	737	4 February 2025