Inline post processing of payu output

CC: @tiloz, @matthew.chamberlain

We occasionally have a “chunk” (usually a year) of a payu simulation in which the post processing of the output has failed. We don’t uncover this until the analysis stage.

We on the UM side haven’t had a chance to encounter this for a few weeks. A colleague (Matt Chamberlain)recently commented on this same infrequent occurrence of a failing post-processing step.

@spencerwong provided us with instruction on how to execute this post-processing conversion manually, assuming payu module is loaded:

esm1p5_convert_nc

The underlying reason for the failure is not certain. However a suggested remedy was to increase the wall time. We have implemented this locally (although we haven’t done any long simulations since anyway).

My question is whether this remedy has been applied generically and my colleague on the ocean side will automatically pick this up.

Hi @Jhan,

If I’m remembering correctly, the previous conversion failures happened when the collation job combining the outputs ran out of walltime. This job would occasionally take much longer than the usual ~20 minutes, and because the netCDF conversion job is submitted from the collation job, it never got set off.

The collation walltime hasn’t been modified in the configurations on Github and so your colleagues will likely still have the default walltime. I can update this in the configurations if it’s looking like an issue. What did you increase the collation Walltime to in your simulations?

That sounds right. From memory it was 1.5 hrs. - but we haven’t done any long runs since so I cant really vouch for that being sufficient.

Hi @Jhan, there is a pull request here for updating the collation walltime to 1.5h here. If you would be happy to review it that would be great!

Thanks,
Spencer