Summary of today’s TWG - I didn’t catch everything so please edit to add/correct as needed.
Date: 2024-04-03
Attendees:
- Anton Steketee (AS) @anton
- Andrew Kiss (AK) @aekiss
- Andy Hogg (AHogg) @AndyHoggANU
- Aidan Heerdegen (AH) @Aidan
- Ezhil Kannadasan (EK) @ezhilsabareesh8
- Dougie Squire (DS) @dougiesquire
- Micael Oliveira (MO) @micael
- Martin Dix (MD) @MartinDix
- Minghang Li (ML) @minghangli
- Kieren Ricardo (KR) @kieranricardo
- Adele Morrison (AM) @adele157
- Angus Gibson (AG) @angus-g
- Rui Yang (RY) @rui.yang
Offline BGC
DS, AK: MOM6 offline tracers to be explored - potentially very useful capability for BGC dev, parameter tuning, spinup (esp. CMIP7) and science - see Offline tracer transport for BGC · Issue #123 · COSIMA/access-om3 · GitHub
ACCESS-OM3 component update
MO: updating model components, following CESM
- updating spack env
- CESM has own fork of FMS but not easy to mix so updated to latest stable FMS release from GFDL - seems to work - now need to compile FMS with special config options to activate the old API
- DS: MOM abstracts the version (FMS1 vs 2) - is this FMS3?
- MO: not sure
- Should we build CESM with these updated components, now that we’ve diverged with our new configs?
- AK: only do that if we need to run CESM configs for debugging one of our OM3 configs
- AS: field dict has changed and would need updating
MOM6 version choice and MOM6 node
DS: are we still happy with tracking CESM? What about when we have our own MOM6 node?
- AG: GFDL say we need to nominate somebody to sign off on PRs. Then up to us to set up test infrastructure to approve PRs.
- AM: Would be good to do - should it be by AG or somebody at NRI?
- AK: wait until we have adopted NRI’s test framework from OM2
- AH: this is underway
- AHogg: will our system meet requirements?
- AG: no formal requirement - we just need to be happy with it
- AH: using pytest - can be controlled via workflow dispatch, very flexible
- MO: Need 1 example to run, and a way to run it, then can expand to other examples
- AHogg: would be good to have AG as one of the approvers, but also to have NRI. Get Tommy’s testing/deployment running, then tell GFDL we’re ready.
- AS: once test infrastructure is established we can incrementally add tests to suit what we care about
- MO: CESM is currently using nearly the latest MOM6 so currently no big motivation to use GFDL - but might not always be the case
- DS: will there ever be things in the NCAR CESM fork we need that are not in the main
- AH: are MOM6 nodes obliged to run MOM6-main?
- AG: not necessarily - GFDL use a much newer dev branch but there are periodic PRs to main for everyone to approve
Profiling & benchmarking
MO:
- one region of MOM6 code (surface forcing) was not scaling with more cores - narrowed down to reproducible sum
- but config set CICE6 max_blocks to a very large number - allocates a lot of memory - huge CICE6 mem footprint - then affected MOM6 apparently because CICE mem too big for cache to hold both MOM6 and CICE6 data
- resolved by a more reasonable max_blocks: parallel scaling improved, but still not great
- profiling paused for now as MOM6 now has a new feature to mask land tiles automatically at runtime to match number of core - want to use this for profiling. MOM6 land proc mask not relevant to CICE which uses a very different approach. Newer CICE6 can also automatically determine max_blocks.
- AH: NetCDF chunk size in output is auto-determined - mppnccombine-fast assumes the same proc land mask for all files
- MO: but auto-masking sets io cores (io_layout) to 1 - not what we want for production but useful for profiling, and we can read in previously auto-generated proc land mask in production configs
- AH: have we looked into parallel io for MOM6?
- MO: not sure, and might not be performant to gather to one core and then redistribute for parallel io
Documentation
- MO, AK: 2 main options (1 is AK’s preference) - see Documentation · COSIMA/access-om3 · Discussion #120 · GitHub
- AH: can it defined in a datastructure? say, doc.yaml in each branch - also makes it easier to systematically extract data
- DS: or no specificity at all - just have one doc with a common section followed by a section for each config which is free-form text extracted from each repo branch
- AH: maintainability a problem if free-form, and unclear to doc writer what is needed
- AH: use submodules?
- MO: not simple to use sphinx or mkdocs with submodules
- AHogg: try something and see how it works - how it looks to the user and how much work to update continuously
- AHogg: and will this scale to the other ACCESS models? eg ACCESS-CM3 and ACCESS-ESM3
- MD: no discussion of documentation for CM/ESM yet - would likely follow OM3’s lead
- AH: may be fewer configs in climate models since they don’t have choice of forcing?
- AHogg: but in future there will be multiple resolutions
- MD: ESM will have a lot more configs than CM
Licensing
AH: what licence to release OM2 under? Software licence · Issue #264 · COSIMA/access-om2 · GitHub
- AH: MOM5 is GPL3
- MO: so no choice - code must be distributed under terms of GPL3. So need to check that all component licences are compatible - and they are.
- AH: what about configs?
- MO: doesn’t matter - these are input, not code - might have IP but not something we need to deal with. Are we distributing code? Licences are about distribution, not the use it is put to.
- AH: were there custom licences that weren’t being adhered to? CICE? OASIS?
- AS: does new CICE licence not matter since it didn’t have one back when we forked the code? Licence was added about 2017ish.
- MO: there’s an issue about this - Not complying with licensing · Issue #67 · COSIMA/cice5 · GitHub
- AH: so just need a GPL-compatible licence for OM2?
- MO: might not strictly need a license for OM2 but it’s easy to add one
- AH: yes clearer just to have one
- DS: there is a little code in the configs, eg shell scripts
- AH: so use Apache? CMIP - data has different licence (eg cc-by) from the code
Next meeting
17 April, usual time