I’m running a 3-year experiment using 1-year runs and payu run --nruns 3
. I woke up to a PBS error email. One job (with the name of the experiment and “_c” had failed with
PBS: job killed: walltime 3652 exceeded limit 3600
(other experiments had failed in other spectacular ways, but I’ll leave those for another post)
Looking at the output, it looks that the model ran fine but the output0001 folder is 16GB instead of the usual 5.4GB and doesn’t have a netCDF subfolder. The ocean folder also has too many files. I guess that this is the collating script that had a problem.
Is it usual that the job would run out of time or might this be a symptom of something else that’s wrong in my config? I’m using the release pre-industrial configuration. The only thing I changed was that I added an SST nudging file. A previous run starting from a different year ran correctly.
Is it possible to re-run only that step manually to solve the issue?