Model development through UM7, ACCESS-ESM1.6 and payu

The job I got started yesterday or the day before is still running. I have been occupied with other matters but returned to check where it is up to. Comments:

The JOBNAME in the queue must be somewhat generic. My workflow at the moment will entail me having several pre-industrial control runs going. It seems they are all going to in the queue with the same name. I payu cloned “ ”. I was expecting the JOB NAME to reflect this. Anyway, not a huge deal.

The most reliable/ simplest way for me to ascertain what month/year/CRUN the model was upto was look at the last dumps created, It seems that archive/ only contains the most recent dump? So are they all kept somewhere?

I tailed the current output but the timestep is likely only relevant to the month running. We had an issue in CMIP5 where we had to keep track of the date in wrappers around the model. So in PAYU how do I find out where Im upto?

I’m sure I’ll have more questions but I only just got back to this. Thanks

Hi @Jhan,

If I understand your issue correctly, you are having difficulty isolating which suites are running (because they all have the same name) and where they are up to?

For the first point, there is a configuration option in payu to provide an explicit name for the PBS job. (See jobname)

https://payu.readthedocs.io/en/latest/config.html#configuration-settings

Setting this to something useful for your purposes will go some way in addressing your first issue.

As for the second point, can you please describe a bit more about how you are using payu (i.e. a snippet of the commands you are using) - This will help in answering your question further. Allowing me to either address your problem directly or assign it to the appropriate team member.

Cheers, Ben

1 Like

Thanks Ben, I’ll make use of $jobname.

hrmmm - my payu commands are pretty plain. payu setup/sweep/run. Everything that might be specific is in the config file. I think I’ve just found it anyway. I think on Friday I was looking in the wrong “jobname” directory.

I recall reading somewhere that to restart a run that has hopefully crashed because of some glitch at NCI and not a legitimate bug, you can just issue:

payu run

again?

Hi @Jhan,

If you run payu run -f then a sweep should occur prior to running.

https://payu.readthedocs.io/en/stable/usage.html#cleaning-up

Based on the docs, a sweep should be performed prior to re-running in the case of a failed submission.

Let me know how you go, happy to help.

Ben

1 Like

Thanks Ben, are you sure I want sweep, won’t this prepare it to start a fresh run? In the docs, under the heading “Running the Experiment” it has setup, then run. then run -f, but the plain text under it says just type “payu run” to start from the last point which is what I at least want to try to do.

What I forgot to ask was - Given my original run command was “payu run -n 1200” is there memory of how long the experiment was intended to be or do I need to specify “payu run -n 1200-TIME_ALREADY_COMPLETED”

payu sweep removes the ephemeral work directory. payu run will throw an error if that directory exists. This is so users don’t delete a work directory without intention, as it provides a useful source of information for diagnosing why a run might have failed.

A work directory can also exist if a user runs payu setup, which is recommended when the configuration is changed, as it updates manifest files which are an important source of information for what projects need to be added to the PBS submit command to ensure all the files can be read when the model is run under PBS.

payu run -f just says “please run regardless of the existence of a work directory”. It is perfectly safe to do this, and it is equivalent to payu sweep && payu run.

It is covered in the docs

https://payu.readthedocs.io/en/stable/usage.html#running-your-experiment

but the -n option is just a simple counter. It has no knowledge of the model time/calendar. You will need to work out how many more iterations you want to run if the model stopped before completing the intended number of resubmissions.

Thanks Aidan. In the interim I tried submitting it with payu run and it wouldn’t without a “sweep”. I was just hesitant use sweep as there is a trick restarting cycle jobs that if you dont do it in the right order it starts over. I was under the impression that “payu run -f” forced a sweep and that meant running the 64 years again. Thanks for the explanation.

1 Like

Nope.

payu sweep --hard

will delete the entire experiment archive.

A subsequent payu run would then start from scratch.