Notes from last week’s COSIMA TWG. Feel free to add whatever I’ve missed or to modify anything that I got wrong.
Date : 2023-09-13
Attendees : Micael Oliveira @micael, Andrew Kiss @aekiss, Aidan Heerdegen @Aidan, Dougie Squire @dougiesquire, Angus Gibson @angus-g, Ezhil Kannadasan @ezhilsabareesh8, Siobhan O’Farrell @sofarrell, Jo Basevi @jo-basevi , Martin Dix @MartinDix
ACCESS-OM3 Update
- 2 releases:
- 0.1.0: this mimics the CESM version we started with (over one year old)
- 0.2.0: updated to newer CESM - nearly cutting edge
- Refined release process:
- all inputs, exes, configs, spack, CI tagged simultaneously, even if unchanged from previous tag
- input directories are also named for the same tag
- development tag is
x
, e.g0.x.0
is0.*
dev branch
- Work on configurations:
- One git repository for config, different flavours (e.g., forcing, grid resolution) are branches
- Currently 2 long-lived branches in MOM6-CICE6
- the CESM compset, unmodified - for testing only
- 1deg JRA55do RYF - OM3 candidate
- work already done to be like OM2, but need to update these to be compatible with 0.2.0
main
not used for configurations - just a README explaining to check out a branch- some documentation on git practices Git practices · COSIMA/access-om3 Wiki · GitHub
AH: suggests as you are already using complete paths in config.yaml
to individual files (not dirs) you can then do away with having a tagged dir for each release, and only update paths to files that have changed. You can use a database for finding out which configs use a given input file.
MO: we’ll see how we go with the current plan - it’s not that onerous.
ACCESS-OM3 Plans
- Short term:
- keep working on configs
- parallelisation, scalability, processor layout options with NUOPC
- at present all components are using all 48 cores, so components run one after the other
AK: would be good to check scaling for the sort of core count we expect to use, eg ~200 cores for 1deg. Want to use more than 1 node.
MO: need time per iteration for each component as a function of core count, so concurrently running components complete in a similar time. Currently all components run in serial, whereas in OM2 they run in parallel. Probably we want a combination of both.
DS: there are files in config giving core counts for components on different machines - could be useful as a reference for scaling. For fully active config, atmospheric component is hardwired in driver to never run concurrently with land or ice, so they should be overlapped on PEs.
MD: Kieran finds CICE restart makes whole thing grind to a halt
MO: is CICE using PIO?
MD: unsure.
DS: compiling with CIME.
MO: should be PIO then.
MD: can use current CESM-based OM3 now for coupling with UM.
AH: could start using spack to help with build.
MO:there’s a lot of logic in the cmake that you don’t want in spack - cross-dependencies between components - compilation needs to be in a particular order - can’t just compile all components separately and then link. eg driver needs to come last. And there are a couple of patches.
MO: easy to change cmake to compile all components or a subset as library without driver.
Payu Updates
- New topic for payu updates: Payu updates at NCI - #2 by Aidan
- “module use” implemented
- Jo working on auto-archiving outputs from scratch by payu - should replace sync scripts
- date-based restart pruning, following tidy_restarts - can specify pandas-style time frequency
- issues to resolve re. collation and sync of final restarts
- future plans: embedding and tracking uuids for reproducibility / provenance
- facilitate multiple experiments per control directory: automatically create run branches based on uuid, also name work and archive directories with uuid to work around limited name-space issues
MO: runlogs - should only be on for production runs, not development test runs
• we’ll force users to create their run fork (or at least branch), without needing payu
• forbid direct pushes to config branches - require PRs
• runlog off by default
• config main branch only has README explaining that a branch needs to be checked out and runlog activated
AH:
• this saves work for a few developers but adds work for many users
• recommends activating runlog by default, as users will forget
MO: can be implemented by turning runlog on in tagged commits, since users should not be using dev branches.
AH: this could be a CI check.