Payu, archives and restarts …

I’ve got a question on the latest payu, which I’ve been using to run access-om3.

I tried running a case with payu run, then ran again … and expected the second run to follow on from the first. But for some reason it made a new archive directory and started from scratch.

Then, I tried to copy in the original restart file to the new archive directory, expecting that to force it to start in year 2 - but again it started from time 0.

I found this behaviour confusing, and a change from previous payu. @jo-basevi - any suggestions on what I’m doing wrong here? Is there a way to ensure runs pick up restarts as the default?

That is unusual, I would’ve thought a subsequent payu run would’ve used the same archive… To help debug whats going wrong are you able to share the experiment configuration? Or a copy of payu’s error/output logs?

Payu would generate a new UUID and archive, if it couldn’t find a pre-existing archive directory under the legacy experiment name (<control-dirname>) or an experiment name with branch and the UUID included (<control-dirname>-<branch>-<uuid>). Possible reasons when it wouldn’t be able to find an archive is if the project code has changed, or on a different git branch.

Setting restart to a full path to a restart file in config.yaml, would also use that restart if there are no restarts in archive.

1 Like

Pure conjecture here …

In the access-om3 configuration, we added a userscript to make an intake-esm datastore automatically at the end of the run.

We need a better fix (Marc is working on it), but sometimes these fail if you’ve never imported the access_nri_intake python package before (note this is different to the intake package used for analysis) .

You get an error like:

RuntimeError: User defined script/command failed to run: /usr/bin/bash /g/data/vk83/apps/om3-scripts/payu_config/archive.sh

Which fails because it attempts to download the access ‘schema’ but the job doesn’t have access to the internet.

The workaround is running this to download a cached copy of the schema

$ module use /g/data/hh5/public/modules
$ module load conda/analysis3
$ python
Python 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from access_nri_intake import source
>>> exit()

And then the userscript should complete succesfully

1 Like

Firstly - thanks @anton - that workaround solved one of the error messages I was seeing, but I don’t think that is what caused my problems before.

@jo-basevi - my runs directory was ~amh157/access-om3/access-om3-025-adapt. I’ve since started a new directory and haven’t been able to replicate the problems, but if you can get into that directory you may be able to disentangle what I did first time!

1 Like

Thanks @AndyHoggANU - I was able to replicate the errors and it is an issue with payu. It’s coming down to a difference in $PROJECT in pbs payu run submission and the commands run on the login node.

So in config.yaml, setting project will pass that project in the payu run submission (e.g. qsub -P PROJECT). In payu, shortpath is the top-level directory for storing payu laboratories and model output and this defaults to /scratch/${PROJECT}.

To replicate the error on my NCI account, I had a config.yaml with the following settings:

project: nf33
# shortpath: /scratch/nf33

Then running payu commands on the login node (e.g. payu setup), the shortpath defaults to /scratch/tm70 because tm70 is my default project. This creates an archive starting with this filepath. Then on payu run, the shortpath defaults to /scratch/nf33, and as it can’t find the previously created archive, it creates a new archive with a new UUID.

So basically payu needs to use the project in config.yaml in it’s default project for shortpath, so I’ve opened a new payu issue here: `shortpath` differences with default project · Issue #502 · payu-org/payu · GitHub

Thanks again for raising this issue! In the meantime, setting shortpath to the config.yaml to match the project would work. For example:

project: nf33
shortpath: /scratch/nf33
2 Likes

Awesome, nice catch.
Thanks!