The job I got started yesterday or the day before is still running. I have been occupied with other matters but returned to check where it is up to. Comments:
The JOBNAME in the queue must be somewhat generic. My workflow at the moment will entail me having several pre-industrial control runs going. It seems they are all going to in the queue with the same name. I payu cloned “”. I was expecting the JOB NAME to reflect this. Anyway, not a huge deal.
If I understand your issue correctly, you are having difficulty isolating which suites are running (because they all have the same name) and where they are up to?
For the first point, there is a configuration option in payu to provide an explicit name for the PBS job. (See jobname)
Setting this to something useful for your purposes will go some way in addressing your first issue.
As for the second point, can you please describe a bit more about how you are using payu (i.e. a snippet of the commands you are using) - This will help in answering your question further. Allowing me to either address your problem directly or assign it to the appropriate team member.
hrmmm - my payu commands are pretty plain. payu setup/sweep/run. Everything that might be specific is in the config file. I think I’ve just found it anyway. I think on Friday I was looking in the wrong “jobname” directory.
I recall reading somewhere that to restart a run that has hopefully crashed because of some glitch at NCI and not a legitimate bug, you can just issue:
Thanks Ben, are you sure I want sweep, won’t this prepare it to start a fresh run? In the docs, under the heading “Running the Experiment” it has setup, then run. then run -f, but the plain text under it says just type “payu run” to start from the last point which is what I at least want to try to do.
What I forgot to ask was - Given my original run command was “payu run -n 1200” is there memory of how long the experiment was intended to be or do I need to specify “payu run -n 1200-TIME_ALREADY_COMPLETED”
Aidan
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
7
payu sweep removes the ephemeral work directory. payu run will throw an error if that directory exists. This is so users don’t delete a work directory without intention, as it provides a useful source of information for diagnosing why a run might have failed.
A work directory can also exist if a user runs payu setup, which is recommended when the configuration is changed, as it updates manifest files which are an important source of information for what projects need to be added to the PBS submit command to ensure all the files can be read when the model is run under PBS.
payu run -f just says “please run regardless of the existence of a work directory”. It is perfectly safe to do this, and it is equivalent to payu sweep && payu run.
but the -n option is just a simple counter. It has no knowledge of the model time/calendar. You will need to work out how many more iterations you want to run if the model stopped before completing the intended number of resubmissions.
Thanks Aidan. In the interim I tried submitting it with payu run and it wouldn’t without a “sweep”. I was just hesitant use sweep as there is a trick restarting cycle jobs that if you dont do it in the right order it starts over. I was under the impression that “payu run -f” forced a sweep and that meant running the 64 years again. Thanks for the explanation.
1 Like
Aidan
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
9
Nope.
payu sweep --hard
will delete the entire experiment archive.
A subsequent payu run would then start from scratch.