When?
11am Friday 21st March 2025, here.
What is this?
A tutorial on how to run the ACCESS-OM2 model for the first time (from rest, a restart or a perturbation!).
Writing credits: @cbull @helen @Aidan. Testing: @NoahDay.
Prerequisites:
- Get account on NCI, also see this help
- Join your own NCI project and then
vk83
andqv56
. Find the project here and join it (seeJoin
tab). Also see this help
Further information
Logging in to NCI
You need to join a NCI project (join your own project – ask a supervisor if you are unsure, then join vk83
and qv56
)
ssh -X <your-NCI-username>@gadi.nci.org.au
- Replace username with your username.
-X
enables graphical forwarding (i.e. pictures)
File system | Description |
---|---|
/home | Backed up. 10 GiB fixed quota per user. |
/scratch | Not backed up, temporary files, auto purge policy applied. |
/g/data | Not backed up, long-term large data files. |
/apps | Read only, centrally installed software applications and their module files. |
$PBS_JOBFS | Not backed up, local to the node, I/O intensive data. |
massdata | Backed up, archiving large data files. |
○ man mdss | Read manual of all mdss commands |
○ mdss dmls -l | List files with status: online/in disk cache(REG), on tape(OFF), or both (DUL) |
○ mdss put/get | Put or retrieve files from mdss |
Experiment manager: Payu
Payu is an experiment manager and model running tool. It is the tool used to run the ACCESS models covered in this tutorial.
Payu is written in python. See the documentation or the GitHub repository for more information.
The latest version is 1.1.6, and that is the minimum required for this tutorial. ACCESS-NRI provides supported conda
environments for payu
, which also contains other dependencies and tools required to run ACCESS-NRI supported models. These can be accessed via the module system:
module use /g/data/vk83/modules
module load payu
Running OM2
Okay, here’s where the magic happens! In the below we make an experiment directory (via mkdir
and payu clone...
), then run the 1deg_jra55_ryf
configuration using payu run
…
mkdir -p ~/access-om2
cd ~/access-om2
payu clone --new-branch expt --branch release-1deg_jra55_ryf https://github.com/ACCESS-NRI/access-om2-configs 1deg_jra55_ryf
#do a `git branch -vv`, and you'll note this has made a new local branch 'expt'
cd 1deg_jra55_ryf
payu run
What if I want to extend an existing experiment (e.g. perturbation from a control experiment)
Use clone command with
--restart
optionpayu clone –-new-branch <my-branch> --branch <control-branch> --restart <folder-path> <URL> <my-folder>
where
<my-branch>
:: name of your experiment branch<control-branch>
:: branch of control experiment in repo<folder-path>
:: path to the restart files on filesystem<repo-URL>
:: control experiment git repo (GitHub or local clone)<my-folder>
:: path of new experiment control directoryOptionally add
--start-point <GIT_REF>
to branch your experiment
from a specificgit
commit or branch that corresponds to the specified
restart folder. This means your experiment is starting with the exactly the same model configuration as it was when the restart was created.Then change some experiment configuration as required, commit changes using
git commit
and thenpayu run
.
Note, this process is very similar for ESM and CM, for example see ESM instructions here
Monitoring the run: PBS
Additional information here. And run man qstat
.
Three particularly important commands:
qsub job.sh
- Submit job defined in the submission script job.shqstat -swx
- gives status of the jobqdel <jobid>
- Delete the job with jobID
Here’s an example of how to use qstat.
$ qstat -swx
The command -swx
is made up of:
-s
: Summary format - shows queue level status rather than individual jobs
-w:
Wide format - displays output in multiple columns
-x:
Extended/Expanded format - includes additional details in the output
The screenshot below is in regards to job 12345678.gadi -pbs
- User
aaa777
submitted the job - To the
normal-exec
queue - They requested
48 cores
and190 GiB memory
- It requested 2:00 hours and has been running for 0:35:21
- The line at the bottom indicates when the job started, what Gadi node it is running on,
2697
, and the space reserved onjobfs
Understanding a PBS script
PBS command | Description |
---|---|
pbs -P | Project for job debiting, /scratch project folder access and data ownership |
pbs -q | Submit the job to the queue |
pbs -l ncpus= | Request CPU cores |
pbs -l storage=<scratch/prj1+gdata/prj2+massdata/prj3> | Storage needed to be available inside the job. massdata is only available in copyq jobs. |
pbs -l ngpus= | Number of GPUs, ncpus has to be 12 x ngpus and the job has to be submitted to gpuvolta . |
pbs -l walltime=hh:mm:ss | Max walltime the job would run |
pbs -l mem=<10GB> | Memory allocation |
pbs -l jobfs=<40GB> | Disk allocation on compute/copyq node(s) |
pbs -l software=<app1,app2> | Licences required |
pbs -l wd | Start the job from the directory in which it was submitted |
pbs -W depend=beforeok:<jobid1,jobid2> | Set dependencies between this and other jobs. |
pbs -a | Time after which the job is eligible for execution |
pbs -M <email@example.com,email2@anu.edu.au> | List of receivers to whom email about the job is sent |
pbs -m | Email events. a for abortioin, b for begin, e for end, n for none |
Model log files
While the model is running, payu saves the model standard output and error streams in the access-om2.out
and access-om2.err
files inside the control directory, respectively.
You can examine the contents of these files to check on the status of a run as it progresses (or after a failed run has completed).
At the end of a successful run these log files are archived to the archive
directory and will no longer be found in the control directory. If they remain in the control directory after the PBS job for a run has completed it means the run has failed.
If the models crashes then most of the time, the errors will be detailed in these files (located in your control directory):
access-om2.err
access-om2.out
but if you need more information then check these files:
work/ocean/log/*
work/ICE/log/*
work/atmosphere/log/*
Model Live Diagnostics
ACCESS-NRI developed the Model Live Diagnostics framework to check, monitor, visualise, and evaluate model behaviour and progress of ACCESS models currently running on Gadi.
For a complete documentation on how to use this framework, check the Model Diagnostics documentation.
ACCESS-OM2 outputs
When your experiment has finished, a new folder will be created to store the outputs and linked to your control directory under `archive’
ls archive
will show you these folders:
metadata.yaml output000 pbs_logs restart000
The folder with the outputs most useful for science applications is output000
which contains a large number of netcdf files in subfolders.
To see netcdf files from the ice and ocean models (respectively) we can do:
ls archive/output000/ice/OUTPUT/
ls archive/output000/ocean/
What if I want Payu to copy my output to another location?
By default, payu laboratory
and archive
directories are in /scratch
storage. The scratch filesystem is temporary storage where files not accessed for 100 days will be automatically removed. For this reason, it is often required for model outputs to be moved to /g/data/
for long-term data storage.
Payu has some syncing support using rsync
commands under the hood which runs in a separate PBS job.
The sync
subsection in config.yaml
should look similar to the following:
sync:
enable: true
restarts: true
path: /scratch/nf33/<replace-with-user-id>/tmp/test-sync-experiment-archive
(More details here)
Edit ACCESS-OM2 configuration
When editing your configuration, it is good practice to set runlog: true
in config.yaml as your changes will automatically be committed when you run your experiments:
Queue settings
These are set in config.yaml
. The default settings are:
queue: normal
walltime: 3:00:00
jobname: 1deg_jra55_ryf
mem: 1000GB
These set which queue you will be in, how much time you need to run for, the name of the job (that will appear when running qstat
) and the amount of memory your run will need. You want to ask for the least amount of resources needed to do your job. Asking too much will result in longer queue time and asking for too little will slow down your job, or crash it.
Run consecutive years/restarts
Once your model run has finished, you can continue from the place that it stopped using
payu sweep
payu run
These commands will clean away the old run, set it up for a new run and then resubmit your job. The outputs from the new run will then be stored in a new folder:
ls archive/output001/ice/OUTPUT/
ls archive/output001/ocean/
This method can be cumbersome if you are running long simulations so see below for instructions on how to run longer simulations and automate the restarts
Change run length
The steps needed to change the length of the model run is different in ACCESS-ESM than ACCESS-OM2. In this tutorial we we focus on changing the length of ACCESS-OM2
Instructions on how to change the run length in ACCESS-ESM can be found here
One of the most common changes is to adjust the duration of the model run.
For example, when debugging changes to a model, it is common to reduce the run length to minimise resource consumption and return faster feedback on changes.
The run length is controlled by the restart_period
field in the &date_manager_nml
section of the ~/access-om2/1deg_jra55_ryf/accessom2.nml
file:
&date_manager_nml
forcing_start_date = '1958-01-01T00:00:00'
forcing_end_date = '2019-01-01T00:00:00'<br>
! Runtime for a single segment/job/submit, format is years, months, seconds,
! two of which must be zero.
restart_period = 5, 0, 0
&end
The format is restart_period = <number_of_years>, <number_of_months>, <number_of_days>
.
For example, to make the model run for 1 year, 4 months and 10 days, change restart_period
to:
restart_period = 1, 4, 10
Troubleshooting: Error and output files
Trouble-shooting: Payu
If payu doesn’t run correctly for some reason, a good first step is to run the following command from within the control directory:
payu setup
outputs from this command will look like this:
laboratory path: /scratch/$project/$user/access-om2
binary path: /scratch/$project/$user/access-om2/bin
input path: /scratch/$project/$user/access-om2/input
work path: /scratch/$project/$user/access-om2/work
archive path: /scratch/$project/$user/access-om2/archive
/g/data/vk83/apps/base_conda/envs/payu-1.1.6/lib/python3.10/site-packages/payu/metadata.py:189: MetadataWarning: No pre-existing archive found. Generating a new uuid
Updated metadata. Experiment UUID: 2c91324a-c432-48ff-bbd8-71ed65163d7a
payu: Found modules in /opt/Modules/v4.3.0
Loading input manifest: manifests/input.yaml
Loading restart manifest: manifests/restart.yaml
Loading exe manifest: manifests/exe.yaml
Setting up atmosphere
Setting up ocean
Setting up ice
Setting up access-om2
Checking exe, input and restart manifests
Writing manifests/input.yaml
Writing manifests/restart.yaml
This command will:
- create the laboratory and
work
directories based on the experiment configuration - generate manifests
- report useful information to the user, such as the location of the laboratory where the
work
andarchive
directories are located
If you run payu setup
, make sure you run payu sweep
before starting your experiment using payu run
.
Recap for intake-datastore
Check in with @CharlesTurner and @anton - was creating intake catalog going to be automatic? Is there anything we should say about this?