Hello
While running panan (MOM6), I noticed that payu saves only the last five restarts + older restarts in every 5th run (e.g., restart000,restart005…). This happens in all panantarctic configs, but not with access-om2. Is there any config that I can set in payu to change this? It might be worth to set up payu to not delete any restarts as default too.
angus-g
(Angus Gibson)
29 September 2023 00:15
2
adele157
(Adele Morrison)
29 September 2023 00:22
3
Ah good to know! @schmidt-christina @Wilton_Aguiar I suggest we should make this change in our panan github configs, what do you reckon?
2 Likes
rmholmes
(Ryan Holmes)
29 September 2023 00:40
4
Restarts can be pretty large, which is why that option was introduced. But they can always be deleted afterwards (there’s the ACCESS-OM2 script tidy_restarts.py
), if you remember!
2 Likes
adele157
(Adele Morrison)
29 September 2023 01:22
5
Yes the problem with only saving every 5th is that when you’re running in only 1 or 2 month run chunks, the restarts it does save have non ideal timing, i.e. not in January.
Definitely second the suggestion for using tidy_restarts.py to tidy up after.
1 Like
Aidan
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
29 September 2023 03:59
6
@jo-basevi has been working on adding calendar based restart_freq
options to payu . Viewable in this PR (which I am currently reviewing, but if anyone else is interested your input is more than welcome)
payu-org:master
← jo-basevi:358-date-based-frequency
opened 10:59AM - 15 Sep 23 UTC
Add support for date-based restart pruning in Payu - should close #358.
As su… ggested by @aidanheerdegen [here](https://github.com/payu-org/payu/issues/358#issue-1856663402), an option is to support a subset of pandas date offset frequency aliases:
| Unit | Description |
| :--- | :---- |
| MS | Month start frequency
| YS | Year start frequency
| M | Month frequency
| Y | Year frequency
| W | Weekly frequency
| D | Day frequency
| H | Hour frequency
| T | Minute frequency
| S | Second freqeuncy
So for example, if in `config.yaml`,
```
restart_freq: 5YS
```
Then the earliest restart of the year, every 5 years from the first restart will be retained.
During archive, the function `Experiment.prune_restarts()` would inspect each restart directory and return a list of restart directories that can removed using integer or date-based restart frequency. For date-based frequency, individual restart directory paths are passed to the model specific driver that can parse the restart files and return a cftime (or standard) datetime object.
In this branch, `access-om2` has the `model.get_restart_datetime()` function implemented, which calls the mom's `get_restart_datetime()` which parses the `ocean_solo.res` file for the final datetime of the restart. I used this file as it is what is used in COSIMA's [tidy_restarts script](https://github.com/COSIMA/1deg_jra55_ryf/blob/master/tidy_restarts.py) and it also contains information on what calendar it is using.
Currently the month (M) and year (Y) frequencies are defined similar to adding relativedelta's month and years. So for example if a datetime was 15/01/2000, after adding an offset of '6M' it would be 15/07/2000. I have only just noticed that 'M' in pandas documentation is actually the end of the month.. Would the end of the month/year be more useful? Also, with the different cftime calendars M and Y could give unexpected results i.e with different day months. For example, I've got some logic for cftime's`360_day`, `noleap` and `all_leap` calendars but nothing yet for the `julian` calendar. Also would the start of month/year (MS/YS) frequencies be sufficient- Is M and Y frequencies even needed?
Week, day, hour, minute frequencies may not be super useful or even necessary but they were easy to add as adding timedeltas is supported with both cftime and standard datetime calendars.
The first restart datetime is used as a point of reference for adding the first frequency interval. Then the next restart with a datetime at or after this checkpoint is kept and then becomes the reference when adding the next time interval. I think using the most recent "kept" datetime as a reference for the next checkpoint could be better than strict regular intervals from the earliest datetime. As the first restart could eventually be lost due to scratch timeouts and using "YS"/"MS" would still keep the earliest restarts in each year/month.
As one of the motivations for date-based restart frequencies was to make syncing restarts to a remote archive easier as payu would know what restarts could be stored permanently. A way to check if a restart would eventually be removed would be to see if the restart directory was in the list returned by `prune_restarts(from_n_restart=restart_history - 1, to_n_restart:self.counter)` where `restart_history` is the config.yaml or default value of how many latest restarts are kept.
Any feedback, questions or suggestions is much appreciated!
This is based on some of the logic inf the tidy_restarts.py
by @aekiss , but as part of payu
, and with some flexibility for specifying the frequency.
Hopefully we should have this in a released version of payu
next week sometime.
3 Likes