I’ve got a question on the latest payu, which I’ve been using to run access-om3.
I tried running a case with payu run, then ran again … and expected the second run to follow on from the first. But for some reason it made a new archive directory and started from scratch.
Then, I tried to copy in the original restart file to the new archive directory, expecting that to force it to start in year 2 - but again it started from time 0.
I found this behaviour confusing, and a change from previous payu. @jo-basevi - any suggestions on what I’m doing wrong here? Is there a way to ensure runs pick up restarts as the default?
That is unusual, I would’ve thought a subsequent payu run would’ve used the same archive… To help debug whats going wrong are you able to share the experiment configuration? Or a copy of payu’s error/output logs?
Payu would generate a new UUID and archive, if it couldn’t find a pre-existing archive directory under the legacy experiment name (<control-dirname>) or an experiment name with branch and the UUID included (<control-dirname>-<branch>-<uuid>). Possible reasons when it wouldn’t be able to find an archive is if the project code has changed, or on a different git branch.
Setting restart to a full path to a restart file in config.yaml, would also use that restart if there are no restarts in archive.
In the access-om3 configuration, we added a userscript to make an intake-esm datastore automatically at the end of the run.
We need a better fix (Marc is working on it), but sometimes these fail if you’ve never imported the access_nri_intake python package before (note this is different to the intake package used for analysis) .
You get an error like:
RuntimeError: User defined script/command failed to run: /usr/bin/bash /g/data/vk83/apps/om3-scripts/payu_config/archive.sh
Which fails because it attempts to download the access ‘schema’ but the job doesn’t have access to the internet.
The workaround is running this to download a cached copy of the schema
$ module use /g/data/hh5/public/modules
$ module load conda/analysis3
$ python
Python 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from access_nri_intake import source
>>> exit()
And then the userscript should complete succesfully
Firstly - thanks @anton - that workaround solved one of the error messages I was seeing, but I don’t think that is what caused my problems before.
@jo-basevi - my runs directory was ~amh157/access-om3/access-om3-025-adapt. I’ve since started a new directory and haven’t been able to replicate the problems, but if you can get into that directory you may be able to disentangle what I did first time!
Thanks @AndyHoggANU - I was able to replicate the errors and it is an issue with payu. It’s coming down to a difference in $PROJECT in pbs payu run submission and the commands run on the login node.
So in config.yaml, setting project will pass that project in the payu run submission (e.g. qsub -P PROJECT). In payu, shortpath is the top-level directory for storing payu laboratories and model output and this defaults to /scratch/${PROJECT}.
To replicate the error on my NCI account, I had a config.yaml with the following settings:
project: nf33
# shortpath: /scratch/nf33
Then running payu commands on the login node (e.g. payu setup), the shortpath defaults to /scratch/tm70 because tm70 is my default project. This creates an archive starting with this filepath. Then on payu run, the shortpath defaults to /scratch/nf33, and as it can’t find the previously created archive, it creates a new archive with a new UUID.