COSIMA TWG Meeting Minutes 2025

TWG meeting 12 Feb

Present: @aekiss, @anton, @MartinDix, @dougiesquire, @cbull, @minghangli, @manodeep

ACCESS-OM3-025 project board

ePBL

ML: Using latest Riechl et al (2024) ePBL parameters but we get truncation errors. To maintain runtime performance comparable to OM2, we’ve set a tracer timestep of 3 hours, which is longer than GFDL’s 2-hour timestep. He’ll test the alpha release (0.4.0) since it includes fixes with more numerically stable schemes that might help address this issue.
CB: ePBL preferred over KPP in COSIMA meeting
CB: what are other groups using?
ML: GFDL are using ePBL with same params
ML: currently tuning params but still not working

other items (project board)

Many items will be resolved with config updates

Repro issue

also on zulip

ML: OM3 build & config passed CI repro test but aren’t actually reproducing - CI test now fixed
but still have no repro between 0.3.0 and 0.4.0
also no repro in MOM6 standalone using MOM6 driver (not NUOPC, no sea ice)
default MOM6 parameters changed but have been fixed.
DS: only one of the long list of parameters is different, and this change won’t affect our model
ML: still no repro even with parameter issue fixed
DS: how did this get through the MOM6 repro testing?
DS: unclear whether we expect MOM6 repro given the version change - would have to check through all the
repro tests don’t even run on 0.4.0 config due to truncation
AK: would be good to have a way to tell if we expect repro between any 2 commits
MS: is run deterministic?
ML: yes
DS: do we want to dig into this to find out why it all changed
CB: would want to know whether we expect reproducibility. If repro not expected, do longer run to see if results are plausible. But gap between the 2 MOM6 version, so not a good use of our time to dig into all those commits.
DS: but how to know if repro is exected with digging into commits
CB: did Marshall mention this a month ago?
DS: might have been discussed at a MOM6 dev meeting we didn’t attend - ask Angus?
AS: ask Marshall
also go back to 0.3.0, add one PR and check whether it’s something in our process that is breaking things
CB: ok will ask Marshall
DS: also check ML’s standalone runs to see if they reach the same conclusions.
This problem will crop up with other components
AS: we could have CI run 1deg for 20yr on every update

MS: for every CI, make it fail to check it works

DS: there are many bugfixes in MOM6 that are turned off by default for repro but which we should turn on, breaking repro

CB: could any of the patches on patches involved in 0.3.0 → 0.4.0 be a problem?
DS: possibly, but unlikely, and not in

MOM6 dev meeting on Tuesday

CB: notes on zulip

bug discovered (doesn’t affect us)

update to MARBL, will change cap, may affect WOMBAT - asked us to check

Marshall gave presentation on GPU work - see link to his notes. Impressively fast progress, eg pressure solve running on GPU. Targeting momentum solver first.
Our software team also contributing
Ed has long todo list
Some things in specified GPU coding style are unsupported by hardware; hard to get vendor support; NVIDIA won’t look at code due to license use (LGPL) - looking to move to more commercially-friendly Apache, which does not oblige disclosue of code changes (see table here). Asking ~90 contributors to approve license change.

DS: Generic tracer: code moving out of mom into ???, may affect us.

DS: Next set of changes will alter defaults - need to keep an eye on this for repro.
AK: good reason to storeMOM_parameter_doc.* in repo
DS: release team suggest doing via payu
MS: do via pre-commit hook to run model for 1 timestep?
AS: then add repro CI test to fail on diff between these files
MS: set up cron to regularly check?
DS: but want to know immediately that defaults have changed
MS: belt and braces - cron job to pick it up in case repro test was forgotten in commit
AS: on release there are more stringent tests than commit, so that would pick it up too
AK: would be nice to also be able to do this with CICE
DS: are we talking about just committing the docs (easy, we can do in payu) or CI repro test against branch to merge in (harder, involved release team)
CB: would be happy to have a go at this but AS is probably better positioned, AS will write something and CB can help review/have a chat.

COSIMA twg update tomorrow

CB: Dougie to give TWG update to Thurs COSIMA meeting
ML: increase tracer timestep may be part of problem with ePBL; reducing from 3 to 2hr (matching GFDL) fixes truncation errors, but performance is worse. Truncation occurs at particular places around Antarctica.
DS: truncations in 0.30 and 0.4.0?
ML: haven’t tested in 0.4
DS: one of the new MOM6 changes (off by default) helps improve model stability
ML: should I discuss this at CSIMA meeting tomorrw, or spend some time working on it first?
CB, AK: might be better to discuss ePBL in offline meeting with the few people who can give good input
ML: Wilton found ePBL performed similarly to KPP but without vertical resolution dependence

Timestep at 1 deg

DS: mom dynamic timesetep is quote short (30 min) and also differs from coupling, unlike 0.25
DS: probably inherited from CESM

Release team meeting update (Tommy, Aidan, Lachlan, Spencer, Jo, Dougie and Chris)

DS: CI to automatically create diffs (like the ones in README) between PR branches and all config branches. DS is coordinating. Some interest from Cable/land teams.

Bluelink invite

Invitation to present at Blue Link in late March on OM3 development and/or high-resolution development.

To be discussed offline

NCRIS/board update

CB: due Friday. @minghangli please write an update on the OM2 new control experiments.

ACCESS-NRI COSIMA 2025 training program

CB: starting next week on Fri 21st Feb (discussion link, draft program). What’s the status AS?
AS: may need to use xp65 env. Will have some slides. Possibly a notebook

Next time…

@dougiesquire on for the agenda for next Wednesday’s OSIT and for the next TWG: Chair: Anton. Minutes: Dougie.