COSIMA TWG Meeting Minutes 2023

Notes from last week’s COSIMA TWG. Feel free to add whatever I’ve missed or to modify anything that I got wrong.

Date : 2023-09-13
Attendees : Micael Oliveira @micael, Andrew Kiss @aekiss, Aidan Heerdegen @Aidan, Dougie Squire @dougiesquire, Angus Gibson @angus-g, Ezhil Kannadasan @ezhilsabareesh8, Siobhan O’Farrell @sofarrell, Jo Basevi @jo-basevi , Martin Dix @MartinDix

ACCESS-OM3 Update

  • 2 releases:
    • 0.1.0: this mimics the CESM version we started with (over one year old)
    • 0.2.0: updated to newer CESM - nearly cutting edge
  • Refined release process:
    • all inputs, exes, configs, spack, CI tagged simultaneously, even if unchanged from previous tag
    • input directories are also named for the same tag
    • development tag is x, e.g 0.x.0 is 0.* dev branch
  • Work on configurations:
    • One git repository for config, different flavours (e.g., forcing, grid resolution) are branches
    • Currently 2 long-lived branches in MOM6-CICE6
      • the CESM compset, unmodified - for testing only
      • 1deg JRA55do RYF - OM3 candidate
    • work already done to be like OM2, but need to update these to be compatible with 0.2.0
    • main not used for configurations - just a README explaining to check out a branch
    • some documentation on git practices Git practices · COSIMA/access-om3 Wiki · GitHub

AH: suggests as you are already using complete paths in config.yaml to individual files (not dirs) you can then do away with having a tagged dir for each release, and only update paths to files that have changed. You can use a database for finding out which configs use a given input file.
MO: we’ll see how we go with the current plan - it’s not that onerous.

ACCESS-OM3 Plans

  • Short term:
    • keep working on configs
    • parallelisation, scalability, processor layout options with NUOPC
    • at present all components are using all 48 cores, so components run one after the other

AK: would be good to check scaling for the sort of core count we expect to use, eg ~200 cores for 1deg. Want to use more than 1 node.
MO: need time per iteration for each component as a function of core count, so concurrently running components complete in a similar time. Currently all components run in serial, whereas in OM2 they run in parallel. Probably we want a combination of both.
DS: there are files in config giving core counts for components on different machines - could be useful as a reference for scaling. For fully active config, atmospheric component is hardwired in driver to never run concurrently with land or ice, so they should be overlapped on PEs.

MD: Kieran finds CICE restart makes whole thing grind to a halt
MO: is CICE using PIO?
MD: unsure.
DS: compiling with CIME.
MO: should be PIO then.

MD: can use current CESM-based OM3 now for coupling with UM.
AH: could start using spack to help with build.
MO:there’s a lot of logic in the cmake that you don’t want in spack - cross-dependencies between components - compilation needs to be in a particular order - can’t just compile all components separately and then link. eg driver needs to come last. And there are a couple of patches.
MO: easy to change cmake to compile all components or a subset as library without driver.

Payu Updates

  • New topic for payu updates: Payu updates at NCI - #2 by Aidan
  • “module use” implemented
  • Jo working on auto-archiving outputs from scratch by payu - should replace sync scripts
  • date-based restart pruning, following tidy_restarts - can specify pandas-style time frequency
  • issues to resolve re. collation and sync of final restarts
  • future plans: embedding and tracking uuids for reproducibility / provenance
  • facilitate multiple experiments per control directory: automatically create run branches based on uuid, also name work and archive directories with uuid to work around limited name-space issues

MO: runlogs - should only be on for production runs, not development test runs
• we’ll force users to create their run fork (or at least branch), without needing payu
• forbid direct pushes to config branches - require PRs
• runlog off by default
• config main branch only has README explaining that a branch needs to be checked out and runlog activated
AH:
• this saves work for a few developers but adds work for many users
• recommends activating runlog by default, as users will forget
MO: can be implemented by turning runlog on in tagged commits, since users should not be using dev branches.
AH: this could be a CI check.

1 Like