COSIMA TWG Meeting Minutes 2024

aekiss · 17 January 2024 05:11

Summary from TWG meeting today - please correct and elaborate as needed.

Date: 2024-01-17

Attendees:

Micael Oliveira @micael
Ezhil Kannadasan @ezhilsabareesh8
Siobhan O’Farrell @sofarrell
Anton Steketee @anton
Harshula Jayasuriya @harshula
Dougie Squire @dougiesquire
Andy Hogg @AndyHoggANU
Andrew Kiss @aekiss
Martin Dix @MartinDix
Angus Gibson @angus-g

MO

profiling: runs hanging when CICE has >76 cores, apparently when reading from the mesh, even when IO is serial. Unlikely to use this many cores at 1 deg so capped profiling to max 76 CICE cores.
will probably want to run MOM & CICE concurrently rather than sequentially, since MOM6 scales better than CICE6, and therefore give more cores to MOM6 than CICE6 - as was done in ACCESS-OM2.
expect to have scaling plots to show next week
AH: whether to run atm sequentially or concurrently depends on relative resolution of ocean and atm

EK: WW3

suggestions from Stefan (BOM) - eg turn on ice scattering and dissipation in wave model
have 5 dissipation models and 2 scattering models - trying these out
implications on what fields are passed from CICE → WW3
but floe size not active in CICE6 yet - tried turning on - requires passing wave spectrum to CICE - this is happening but unsure what spectral types to transfer
SO: will chat to @NoahDay & @lgbennetts about this; also there was a relevant presentation from Cecilia Bitz at MOSSI2022 [but not on the MOSSI youtube channel] re. dissipation option 4; definitely want to put spectrum into CICE6 but unclear whether full spectrum is needed
EK: has confirmed the full spectrum (25 components) is getting through to CICE
SO: will confirm if @NoahDay used floe size bins the same as current CICE defaults
EK: for testing has set categories to 12 based on Noahs experience
enabled floe size distribution output
SO: not all WW3 options require FSD, some can work with default option (all floes the same notional size), but CICE would not interact with waves without FSD
EK can also read in a netcdf FSD file
SO: can generate internal spectrum within CICE
Stefan also suggested disabling interpolation in time as not suitable with coupling; also not sure about Langmuir turbulence and non-breaking-wave induced mixing; currently turned on; also 11 parameters in MOM6 for that, following Shuo Li’s config
AH: non-breaking wave mixing is controversial - unclear whether to use this
SO: would also need to look into how it’s hooked into turbulence closure in MOM - check what the US teams are doing - will find contacts to follow up

HJ:

Spack v0.21 failed to compile parallelio with the netcdf version we are using - is it ok to pin to the old working version or should we use a newer one?
MO: if versioning is truly semantic it should be fine
AS: CICE6 says 2.5 & 2.6 are supported so 2.5.10 should be fine for CICE6

AS:

openmpi symlinks bug raised last year - a patch now done but 4.1.7 won’t be released until late Q1 2024
patch in mom6-cice6 config to turn on netcdf4 and parallel read/write - seems to improve performance; will be more significant with more cores
when updated can also turn on parallelio for other components
working with CICE people to improve netcdf error handling
looking into wave-ice interaction with EK
merging updates into ww3 configs
might be a bug in ww3 history output
MO: how long will it take NCI to install updated openmpi
AS: will request from NCI helpdesk when released

DS:

added coupling diagnostics - revealed unexpected things but sorted out - just misinterpretations
coupling side of wombat ready to have somebody look at - will also ask NCAR people to look at it
started porting wombat to generic tracers
going on 3wks leave at start of Feb
AH: CESM MOM6 workshop coming up - won’t be sending anyone in person but would be good to have a presentation to update people
AK: overlaps with AMOS - presentations could be similar - I’ll share slides and welcome any input
porting wombat to generic tracers - need to decide how to handle wombat versioning - eg @pearseb has updated to have ~25 tracers - will initially port the old version before updating - the old version system used include files but this isn’t working anymore - versioning a bit confusing
proposal: 1st port old wombat to generic tracers, tag it, then update to Pearse’s version
merging versions could be difficult - need to arrange a meeting
want to share with NCAR before going on leave

CMIP timeline

AH: will be a meeting at AMOS (lunchtime Tues) re. fast track v2 CMIP which might shed light on timelines
AH: running candidate models ready for testing ideally by end of June 2024 but maybe as late as Dec
AS: need to clarify what we are aiming at with OM3
DS: high-level planning meeting sometime soon would be useful
AH: broader meeting than TWG - include a few community folks
AH: would be good to have an overview of what needs doing and a straw-man timeline
MO, AS: good to have a task list and timeline eg gantt chart eg on github — see this topic to discuss

MO: creation of 1 or 2 repos for helper scripts/utils

already have one (om3utils) but want it to be a tested, documented python package
also need a place to dump scripts eg for reproducibility
propose having both, and moving things between them as needed
DS: can become unclear where to find things unless scope of each repo is clearly defined
MO: want package to be useful and reuseable
DS: don’t all scripts already meet that?
MO: maybe not
DS: a dump repo would need still need structure (subdirs), docs (READMEs) and code review as they need to be working, and should link git commit in metadata of output
MO: agreed - call it om3-scripts
DS: include environment.yaml to record the conda env
DS: at some point we may want to bundle scripts into a python package
MO: will move om3-utils repo to COSIMA org - should be reviewed

Next meeting
1-2pm Wed 31 Jan to avoid AMOS

micael · 2 February 2024 04:51

Summary from TWG - please correct and elaborate as needed.

Date: 2024-01-31

Attendees:

Micael Oliveira @micael
Ezhil Kannadasan @ezhilsabareesh8
Siobhan O’Farrell @sofarrell
Anton Steketee @anton
Dougie Squire @dougiesquire
Andy Hogg @AndyHoggANU
Andrew Kiss @aekiss
Martin Dix @MartinDix
Angus Gibson @angus-g
Kieran Ricardo @kieranricardo
Adele Morrison @@adele-morrison
Minghang Li @minghangli

The meeting was entirely dedicated to discussing the ACCESS-OM3 development
timeline and OM3 priorities for contributing to ACCESS-CM3 and ACCESS-ESM3 in
time for CMIP7.

Current plans for OM3 configurations

AK shared some slides related to model development workflow and planned OM3
configurations:

AH:

OM3 is currently in the “Model Timesteps” stage (see slides above).
OM3 1deg configuration at the “Preliminary Optimization” stage.
CM3 is not there yet and some work remains to be done.

MD: OM3 configurations are still based on the corresponding OM2 ones. Need to
change the grid type to take advantage of C-grid features.

AH:

There are several planned OM3 configurations, plus some configurations that might be develop or not.
A MOM6-CICE6 configuration will be used for CM3
A MOM6-CICE6-WOMBAT configuration will be used for ESM3
Configuration for CMIP will be 0.25deg MOM6-CICE6-WOMBAT

Scientific options and configuration development

AM: Will any of these configurations include ice-shelf cavities? Maybe a new
line is needed in the table?

SO: Ice-shelf cavities require high resolution. 0.25deg might not be enough.

MD: What scientific options do we want to use for CMIP?

AS: C-grid for sure. Too early to add landfast ice.

AH: Need to distinguish:

scientific options for configurations that we want to facilitate that are of interest for the research community (e.g. land-fast ice, waves)
scientific options to explore for CMIP (e.g. c-grid)

AH: Regarding MOM6, we might explore different vertical coordinates (isopycnal vs Z*)
AM:

We could do like for the MOM6-SIS2 global configuration: run with different coordinates and compare results.
We could also compare with NCAR/GFDL configurations. Good to do during the optimization step of the configuration development.

AH: At which phase to do that?
SO: At preliminary optimization/evaluation.

AM: At which resolution? Is 0.25deg worth doing? Scientific community is more interested in 0.1deg.
AH:

The question is rather in which order we do the work.
We can probably reuse a lot of the work done for 0.25deg for the 0.1deg configurations.

DS: There are lots of options to test for WOMBAT.

AS: How quickly do we want to have the more experimental ice-related options?
AM: We want them for the 0.1deg configuration for the scientific community.

MD: It’s probably not worth putting too much effort into the 1deg configurations.
SI: Keep it as a fast option, but not for CMIP 7.
AK: 1deg could be very useful for tests and continuous integration.

OM3 CMIP 7 Timeline
AH: Timeline? What to prioritize: 0.25deg MOM6-CICE6 → 0.25deg MOM6-CICE6-WOMBAT

AK: Cheap configuration needed for testing when updating codebase.

DS: What work is required to go from one resolution to another?

AH/AK: For OM2, most work went into updating the topography. Then remapping
weights for OASIS exchange grids and tuning.

Consensus: 1deg MOM6-CICE6 → 0.25deg MOM6-CICE6 → 0.25deg MOM6-CICE6-WOMBAT

AH: What priority for WW3 configurations?
AK: only useful for scientific community interested in waves. We can probably keep these configurations in sync with the others.
AH: Need to ask community about interest.

AH: Proposal to have CMIP7 configurations ready for full evaluation by
mid-year.

AH: Will WOMBAT be okay with this?
AM: Will need WOMBAT by mid-year or later?
MD/AH: Might come a bit later.

DS: Do we need to update the grids and topography?
AH: Yes.
MO: We have the tools and the workflow. Now just need to do it.

AK: Still missing C-grid in CICE.
AS: This looks doable
AK: There are some known drawbacks of using C-grid. Need to be aware of it.
AS: C-grid in CICE is, as a feature, considered finished.
AK: But not all features are available for C-grid.
AK: Issue with mediator/coupler as it uses A-grids internally
KR: With CMEPS, all fields need to be on the same grid.

Task assignment
Minghan: 0.25deg configuration
Micael: topography and grids, scaling and performance optimization
Anton: CICE
Dougie: WOMBAT
Ehzil: ?

Project management
AK: We will set up a project dedicated to CMIP7 on the COSIMA Github
organization. All members should try to update existing issues and add missing
issues.

Next meeting
Back to usual schedule: 11am Wed 14 Feb.

anton · 14 February 2024 03:03

COSIMA TWG

Date: 2024-02-14

Attendees: Andrew Kiss, Anton Steketee, Micael Oliveira, Minghang Li, Martin Dix, Harshula, Angus, Ezhil Kannadasan, Aidan (Apologies Dougie, AH)

0.25 Degree Config:

AK has started issue #101 in the ACCESS-OM3 repository: develop MOM6-CICE6 025deg_jra55do_ryf based around the MOM6-CICE6 1 degree configuration 1deg_jra55do_ryf and the ACCESS-OM2 0.25° configuration.

Project Board:

AS: We could add analysis notebooks + some regular runs on 1 deg configuration

AK: For OM2 we had a figures directory in the ACCESS-OM2 report repo with the metrics of interest, aimed to be a living approach but one that is mutually acceptable. Probably need to start a new repo for that. We could also include performance metrics.

One notebook for each figure / very simple metrics can work well.
Start with manually generated intake catalogue - for om3 analysis.

Aidan: This could be an application for the MED Team (Mike Tetley’s) live diagnostics tool or jupyter intake scripts. We could add a hook from payu to generate an intake catalogue + load this into the live diagnostics. Initial approach: use Intake, manually generate intake catalog; then use payu auto-catalog when available

Action for Anton : Follow up with Romain + Mike + Aidan.

Processor Layout:

MO: Writing scripts / tools to analyse parallel performance (currently in a branch of the om3-utils git repo). Using the trace generated by ESMF. Which includes all the things that go through the driver & mediator. The profiling separates the timing into code ‘regions’ (e.g. for coupling, timesteps, etc)

Adding cores mostly very poor efficiency. MO proposes shifting to running ice+atm+ocean simultaneously, with just enough cores that ice + atm run faster than ocean.

MOM6 is approx 5x slower than in access-om2 per model year. This is with a different number of time steps and MOM6 is fairly different compared to MOM5. MOM6 is / would be dominating the timing if components run simultaneously, so looking at the internal timing within MOM could be the next step. Otherwise we could go straight to a 0.25 degree config, rather than work too much on optimising at 1 degree.
MD asked whether it was related to nuopc, but we don’t think so. ESMF profiling specifically isolates MOM6 timestepping as the culprit.

Angus suggested investigating compiler flags, or more comprehensive profiling in MOM.

Aidan suggested investigating the IO but MO has checked that the profiling IO is mostly outside the MOM code.

Spack:

Harshula:

Transitioning to new spack versions (v0.20 to v0.21).
For OM2: Downgrading to PIO 2.5.2 from 2.5.10 (forum post). Plus removing nci-openmpi psuedo package, and using the openmpi system build directly. At this point this only impacts the ACCESS-NRI OM2 build. The ACCESS-NRI & COSIMA build give identical data outputs.

Repro-CI

Aidan: Adding repro-CI for access-om2 to test compilation and bitwise reproducability for output / results from OM2 runs. This will allow comparisons between versions + code changes etc and is something we will need to investigate for OM3.

Waves

Andrew Kiss - spoke to Alberto Meucci from Uni Melb at AMOS who is interested in our wave parameter choices and output. Meeting to be organised to gather input from wave modellers on our WW3 parameter choices.

Next Meeting: March 6

dougiesquire · 6 March 2024 02:43

Summary from TWG meeting today. I definitely missed some of the details of the Spack discussion. Please correct and elaborate as needed.

Date: 2024-03-06

Attendees:

Anton Steketee (AS) @anton
Andrew Kiss (AK) @aekiss
Andy Hogg (AHogg) @AndyHoggANU
Aidan Heerdegen (AH) @Aidan
Ezhil Kannadasan (EK) @ezhilsabareesh8
Dougie Squire (DS) @dougiesquire
Micael Oliveira (MO) @micael
Martin Dix (MD) @MartinDix
Minghang Li (ML) @minghangli
Tommy Gatti (MG) @TommyGatti

1deg MOM6-CICE6 scaling

MO shared some results

MOM6 scaling plot:
- Time taken
  - MOM total runtime includes time waiting for other components. Other components don’t include
  - MOM6 takes 80-90% of total runtime
- Parallel efficiency
  - Going to more that one core drops efficiency (more comms, regions of serial code). Possibly drops more than we would like
  - Region with worst efficiency is ocean surface forcing - don’t remember seeing anything like this for pan-antarctic configs. AK: probably not IO. AH: There were OM2 issues with chunking that NH had to fix. AK: that would be outside of the issue region.
- Fraction of time spent in different regions
  - Surface forcing takes more and more time with ncpus.
  - AHogg: something happening beyond a single node. MO: need to investigate
  - DS: Does this include ocean surface forcing stuff in MOM NUOPC cap. MO: No, think it’s the stuff in MOM
Varied number of cores assigned to each component, keeping total number of cores the same
- The ocean benefits the most from having more cores
- Conclusion: give lots of cores to ocean and only a few to everything else
OCN-MED exchange
- Giving more cores to ocean makes OCN-to-MED faster, MED-to-OCN slower

MD: How does runtime compare when running in parallel on a single node, relative to running components sequentially? MO: roughly the same

MO: will put issue on GitHub with summary and plots

AHogg: Nice framework for doing this for other models (e.g. CM3). MO: ESMF level profiling (e.g. timing on NUOPC phases) will be available for all models, but degree of profiling within a component depends on what’s implemented in that component.

AHogg: Can we do some long runs? MO: yes, things are still cheap even if they’re inefficient. Increasing the timestep is an obvious low hanging fruit for getting things more efficient - currently timestep is shorter than OM2. AHogg: We have compute. Let’s get some longer runs underway.

OM3 releases and component updates

MO: 6months since last component update. Lot’s of new stuff. Worth doing another update? Main issue is that configuration will require changes. DS: only going to get more disruptive, so yes MO: do we want to update to latest CESM version or latest version of components? AK: last time I checked, CESM wasn’t up to latest CICE that includes C grid. AS: my parallel IO and date bug work will only be in latest CICE. DS: let’s open an issue to keep track of what versions are being used and keep track of the process? MO: yes, let’s do this any time we want to update components.

AH: So there’s a requirement that everyone is working from the same versions. MO: No, there’s a process for tagging versions, but developers can choose their own versions and build themselves. For next update of components, I will do a release - suggest that in parallel ACCESS-NRI release team goes through the process themselves and see how things work out. AH: Sure, but we have a requirement for an ACCESS-OM3 spack package. MO: That exists. AH: What about dependencies? MO: external dependencies (e.g. ESMF,FMS, PIO) are taken from spack packages, model components are pulled as git sub-modules. AS: Harshula would prefer that individual model components are built with individual spack packages. MO: We will never be able to say that OM3 is just a list of spack dependencies because it’s a single exe and we apply patches to individual components at build time. AHogg: this will also be the case for ESM3, CM3 etc. So we need a process for this. AH: the sooner we start the better. Probably can compile to a single executable with current design. What’s important is that all dependencies are handled by spack so that we can easily switch between versions. Now’s the time for OM3 dev team and release team to start working out how releases will work. A user story from OM3 developers would be helpful to the release team to help them improve workflows etc. MO: we are currently using the ACCESS-OM3 spack package to build OM3.

Where to open issues and replicating updates across configs

DS: Would be good to have guidelines around where to open OM3 issues. Currently have issue spread across configuration repos and the access-om3 repo. AK: I’ve been putting things on access-om3 DS: Shall we only open issues in config repo if the issue is only relevant to that config, everything else in the access-om3 repo? MO: Preference to open issue where you plan to open PR TG: Could you use a platform (e.g. Zenhub) to group issues? AS: Let’s try raising everything in access-om3 repo and see how that goes.

DS: it would be good to set up a workflow on Github to automate cherry-picking commits across configs. All: Agreed. AS: we should have reproducibility CI first.

Reproducibility CI

DS: we need reproducibility CI for ACCESS-OM3 configs urgently. We’ve already accidentally merge a few config-breaking PRS. The ACCESS-NRI release team have set up infrastructure for this. Let’s use it. AH: big issue is where to run - currently ssh into Gadi. Need to store ssh secrets etc in the repo. Anywhere you do that might need to be pretty well locked down (limited who has access). Or schedule tests from a fork on ACCESS-NRI org? AS: Is it not possible to set up a github runner avoid needing to ssh. TG: We looked at that. There are a few security holes in that approach. AH: Will probably do that in the future, but it doesn’t really solve the need to have privileged access. AH: There are also complications around using the CI Gadi account - admins don’t like attaching multiple projects. That’s why we have everything in vk83. AH: We only run checks when we open a PR from a dev branch to a release branch. Would it suit your use case to schedule tests? MO: Ideally not. We’d like to run with every PR. Should be possible on Github. Should also set up short tests that run on GitHub runners, e.g. Payu setup and checks. Would have to mock filesystem or something to get things to work.

AH: I’ve gone with a different layout of inputs for ACCESS-OM2. Might make it difficult to have a seamless transition from COSIMA to ACCESS-NRI.

DS: DS and AS to meet and chat about ACCESS-OM3 repro CI then reach out to release team.

AK: Note that the release team has found that OM2 doesn’t reproduce across restarts. There’s a whole range of what we mean by “reproduce” - need a whole suite of tests

Documentation

AK: Putting together something as a discussion on ACCESS-OM3. Need a coordinated way to document what we’ve done and how people can use it. Also need versioning and need to keep documentation synchronised. AH: Heads up: working through the versioning currently - Think we have a model for that that will allow us to update old versions. MO: Re documentation sync: standard approach is to put documentation source in repo and use Sphinx or mk_docs to deploy to GitHub pages or rtd. Questions: do we keep everything in om3 repo - do we also have config-specific documentation?

Next meeting

Next meeting date may be changed as a few away. Will update time in announce topic.

aekiss · 25 March 2024 00:29

TWG summary from last week - a bit scrappy and incomplete so please fill in anything I missed.

Date: 2024-03-21

Attendees:

Anton Steketee (AS) @anton
Andrew Kiss (AK) @aekiss
Andy Hogg (AH) @AndyHoggANU
Aidan Heerdegen (AHeer) @Aidan
Ezhil Kannadasan (EK) @ezhilsabareesh8
Dougie Squire (DS) @dougiesquire
Micael Oliveira (MO) @micael
Martin Dix (MD) @MartinDix
Minghang Li (ML) @minghangli

Performance scaling

ML - working on Micael’s performance tools, trying to reproduce results - 3 issues

cesm driver fails to transfer some settings correctly to esmf - inconsistent with esmf docs - has worked out a workaround
env vars in env section of config.yaml
can’t run with >64 cores - hangs - cice problem?

suggests documenting these issues

MO: 1 a known issue - runconfig profiling settings ignored - need to use env vars

MO, AS: 3. a known problem in cice - can’t use >76 cice cores. Not a hard limit - due to a parameter setting - to do with not using roundrobin; not relevant since cice doesn’t scale to that core count anyway

MO: surface forcing the culprit for bad MOM6 scaling - specific to NUOPC cap; not seen in panan etc which use FMS - now trying to identify in more detail - a lot of load imbalance - worse if launching many jobs at once - an IO issue?? but nothing obviously IO related in code region. mom_surface_forcing file. Adding extra profiling regions. Trial and error.

DS: cap converts ESMF fields to MOM fields

AS: is it reading salt restoring

DS: looks like it - there are some salinity restoring io calls (see time_interp_external)

Model evaluation

AK: ENKF-C may be worth looking at for model-obs comparison

DS: Clothilde was planning to try this for eReefs - see how they went with it

Input directory structure

DS: issue with moving all inputs to vk83 for repro CI - how to structure it? Poll - vote! issue Move inputs to `vk83` · Issue #115 · COSIMA/access-om3 · GitHub

option A: version at top level
option B: version at innermost level

explicit full path specification for all individual input files in config.yaml

MO: sandbox 0.x.0 → 0.2.0 easier if versioned at top level but no strong pref

DS: linking version of input to version of exe - might be a pain if we want to do a lot of updates. But flipping could lead to a lot of versions that never really got used

AK: use symlinks?

AHeer: Kelsey say symlinks will burn you in the end. Flipped model (option B) is easier and clearer for users and doesn’t need symlinks. That’s what is being done and best for OM2 release

AH: let’s just go with flipped (option B) then, since no strong opinions

MO: sandbox could be useful for dev - some way to build test exes / configs to play with without doing a release - how we set things up for devs can be independent of how we do releases

AHeer: has to be on vk85 or tm70

AS: git-lfs ? each dev with their own fork?

AHeer: quickly run into file size limits with high res

DS: have to pay for lots of storage - not too expensive, maybe $5-10/mo for OM3 (without forcing files)

AHeer: try it out?

AH: happy to cover storage charges

AHeer: Or Tiledb - that does actual diffs on binary files (unlike git-lfs) - has a free version

AHeer: both manifests and git-lfs store hashes

DS: but git-lfs also stores revision history

AS: each file change doubles the storage for that file (doesn’t store deltas)

DS: could get very expensive - to investigate before deciding - actually probably unaffordable Slack

Namelist disussion: diabatic_first

DS: Namelist disussion: diabatic_first - we set to true - do we mind if we set it false (default) as it changes order of ops for generic tracers to be closer to MOM5-wombat - updates tracers in dynamic step

AH: don’t know why this is true

DS: setting comes from ncar - all our cosima mom6 configs and mom6-examples have it false

AH: will ask Marshall

AK: is it related to NUOPC cap?

DS: will check

Restart issue

EK: looking into restart file issue and looking at parameter and 0.25 restart - runs well except for restart

Next meeting

3 April, usual time

aekiss · 3 April 2024 03:15

Summary of today’s TWG - I didn’t catch everything so please edit to add/correct as needed.

Date: 2024-04-03

Attendees:

Anton Steketee (AS) @anton
Andrew Kiss (AK) @aekiss
Andy Hogg (AHogg) @AndyHoggANU
Aidan Heerdegen (AH) @Aidan
Ezhil Kannadasan (EK) @ezhilsabareesh8
Dougie Squire (DS) @dougiesquire
Micael Oliveira (MO) @micael
Martin Dix (MD) @MartinDix
Minghang Li (ML) @minghangli
Kieren Ricardo (KR) @kieranricardo
Adele Morrison (AM) @adele-morrison
Angus Gibson (AG) @angus-g
Rui Yang (RY) @rui.yang

Offline BGC

DS, AK: MOM6 offline tracers to be explored - potentially very useful capability for BGC dev, parameter tuning, spinup (esp. CMIP7) and science - see Offline tracer transport for BGC · Issue #123 · COSIMA/access-om3 · GitHub

ACCESS-OM3 component update

MO: updating model components, following CESM

updating spack env
CESM has own fork of FMS but not easy to mix so updated to latest stable FMS release from GFDL - seems to work - now need to compile FMS with special config options to activate the old API
- DS: MOM abstracts the version (FMS1 vs 2) - is this FMS3?
- MO: not sure
Should we build CESM with these updated components, now that we’ve diverged with our new configs?
- AK: only do that if we need to run CESM configs for debugging one of our OM3 configs
- AS: field dict has changed and would need updating

MOM6 version choice and MOM6 node

DS: are we still happy with tracking CESM? What about when we have our own MOM6 node?

AG: GFDL say we need to nominate somebody to sign off on PRs. Then up to us to set up test infrastructure to approve PRs.
AM: Would be good to do - should it be by AG or somebody at NRI?
AK: wait until we have adopted NRI’s test framework from OM2
AH: this is underway
AHogg: will our system meet requirements?
AG: no formal requirement - we just need to be happy with it
AH: using pytest - can be controlled via workflow dispatch, very flexible
MO: Need 1 example to run, and a way to run it, then can expand to other examples
AHogg: would be good to have AG as one of the approvers, but also to have NRI. Get Tommy’s testing/deployment running, then tell GFDL we’re ready.
AS: once test infrastructure is established we can incrementally add tests to suit what we care about
MO: CESM is currently using nearly the latest MOM6 so currently no big motivation to use GFDL - but might not always be the case
DS: will there ever be things in the NCAR CESM fork we need that are not in the main
AH: are MOM6 nodes obliged to run MOM6-main?
AG: not necessarily - GFDL use a much newer dev branch but there are periodic PRs to main for everyone to approve

Profiling & benchmarking

MO:

one region of MOM6 code (surface forcing) was not scaling with more cores - narrowed down to reproducible sum
but config set CICE6 max_blocks to a very large number - allocates a lot of memory - huge CICE6 mem footprint - then affected MOM6 apparently because CICE mem too big for cache to hold both MOM6 and CICE6 data
resolved by a more reasonable max_blocks: parallel scaling improved, but still not great
profiling paused for now as MOM6 now has a new feature to mask land tiles automatically at runtime to match number of core - want to use this for profiling. MOM6 land proc mask not relevant to CICE which uses a very different approach. Newer CICE6 can also automatically determine max_blocks.
AH: NetCDF chunk size in output is auto-determined - mppnccombine-fast assumes the same proc land mask for all files
MO: but auto-masking sets io cores (io_layout) to 1 - not what we want for production but useful for profiling, and we can read in previously auto-generated proc land mask in production configs
AH: have we looked into parallel io for MOM6?
MO: not sure, and might not be performant to gather to one core and then redistribute for parallel io

Documentation

MO, AK: 2 main options (1 is AK’s preference) - see Documentation · COSIMA/access-om3 · Discussion #120 · GitHub
AH: can it defined in a datastructure? say, doc.yaml in each branch - also makes it easier to systematically extract data
DS: or no specificity at all - just have one doc with a common section followed by a section for each config which is free-form text extracted from each repo branch
AH: maintainability a problem if free-form, and unclear to doc writer what is needed
AH: use submodules?
MO: not simple to use sphinx or mkdocs with submodules
AHogg: try something and see how it works - how it looks to the user and how much work to update continuously
AHogg: and will this scale to the other ACCESS models? eg ACCESS-CM3 and ACCESS-ESM3
MD: no discussion of documentation for CM/ESM yet - would likely follow OM3’s lead
AH: may be fewer configs in climate models since they don’t have choice of forcing?
AHogg: but in future there will be multiple resolutions
MD: ESM will have a lot more configs than CM

Licensing

AH: what licence to release OM2 under? Software licence · Issue #264 · COSIMA/access-om2 · GitHub

AH: MOM5 is GPL3
MO: so no choice - code must be distributed under terms of GPL3. So need to check that all component licences are compatible - and they are.
AH: what about configs?
MO: doesn’t matter - these are input, not code - might have IP but not something we need to deal with. Are we distributing code? Licences are about distribution, not the use it is put to.
AH: were there custom licences that weren’t being adhered to? CICE? OASIS?
AS: does new CICE licence not matter since it didn’t have one back when we forked the code? Licence was added about 2017ish.
MO: there’s an issue about this - Not complying with licensing · Issue #67 · COSIMA/cice5 · GitHub
AH: so just need a GPL-compatible licence for OM2?
MO: might not strictly need a license for OM2 but it’s easy to add one
AH: yes clearer just to have one
DS: there is a little code in the configs, eg shell scripts
AH: so use Apache? CMIP - data has different licence (eg cc-by) from the code

Next meeting

17 April, usual time

anton · 1 May 2024 03:43

Summary of today’s TWG - I didn’t catch everything so please edit to add/correct as needed.

Date: 2024-05-01

Attendees:

Anton Steketee (AS) @anton
Andrew Kiss (AK) @aekiss
Andy Hogg (AHogg) @AndyHoggANU
Ezhil Kannadasan (EK) @ezhilsabareesh8
Dougie Squire (DS) @dougiesquire
Micael Oliveira (MO) @micael
Martin Dix (MD) @MartinDix
Minghang Li (ML) @minghangli
Kieren Ricardo (KR) @kieranricardo
Siobhan O’Farrell (S0) @sofarrell

New Meeting Organiser

AS will take on organising / hosting meetings. MO has moved to a different role at ACCESS-NRI

CM3 - Update

KR has run the prototype CM3 for two model years - with CICE6, MOM6 coupled to the UM. There is a energy balance issue showing as SST growth / warm bias.

Appears to be heading towards similar performance on CM2, but will need work to optimise this. Its using 576 cores, but running sequentially, rather than end goal of simultaneously. Parallel efficiency is ~60% above 96 cores. (In CM2, UM at 576 cores and MOM at 80 cores and cice on ~16 cores. Looks like we will end up similar.)

CM3 prototpye is based on January OM3 build but will update soon.

Sea-ice cycle looks surprisingly good for this stage of development.

Complete CM3 configuration is a rose-cycl suite, MD to organise trying to moving to a private github repo.

OM3 - 1 degree

Micael is hoping to look at the proportion of cores between components, through automated scripting in OM3-utils. So far he has investigated increase core counts and how this impacts run time.

Scaling information on OM2:

ACCESS-NRI needs to be providing some scaling to provide with users who are submitting grants / doing runs etc

Initially we just need number of cores, how many SUs to do a run, and refer to ACCESS-OM2 Paper. AK to provide details used in previous NCMASS applications for a first pass at this information.

ESM1.6:

There are plans forming for updating ESM1.5 with newer model versions to create an ESM1.6. ESM1.6 is a fallback option for CMIP7 Fasttrack, which it looks like we won’t have ESM3 development sufficiently progressed to meet the timeline for. The main interest is updated WOMBAT, CABLE3 and possibly MOM5. However, WOMBAT development is being moved to the generic tracer framework. In theory, this can be used with ESM1.5/6, but some additional work will be required to get things working with the OASIS-MCT coupler. Is that high priority, or should the focus be ESM3?

DS to book a meeting with CSIRO stakeholders + wombat developers + AH.

CICE

AS has identifed that the area fields being used within CICE are inconsistent with the area fields used by MOM + NUOPC. They are calculated assuming square grid cells, which is not accurate for a round globe and especially problematic for the tripole. In the AUSCOM build of CICE5, code was added to read these areas from the grid file, however in CICE6 much code extra fields have been added to support the C-grid meaning making a new grid file could be come bloated. AS to talk to CICE-consortium/NCAR/NOAA about if they would use a loading this information from the MOM “supergrid” or an updated CICE grid file before implementing one option as a code change.

025 Degree profiling

Minghang is progressing with the 0.25 deg work. Minghang will have a go a make plots that are similar to the ones that Micael made for 1 degree.
Has trouble running latest OM3 build with <240 cores.
MOM init time is slow, needs investigating because it is impractically slow (10-20mins).
Performance about 10% better with land masking on.
Best performance is currently at 192 cores, needs investigating why it drops off soo much at higher core counts.

DS to book a meeting with MO, ML + anyone else interested to try and clarify and scope the best steps to profile efficient and finalise core counts.

Next meeting

15 May, 11:00AM AEST

dougiesquire · 15 May 2024 02:42

Summary of today’s TWG - I didn’t catch everything so please edit to add/correct as needed.

Date: 2024-05-15

Attendees:

Dougie Squire (DS), Andrew Kiss (AK), Martin Dix (MD), Ezhil Kannadasan (EK), Micael Oliveira (MO), Anton Skeketee (AS), Andy Hogg (AH), Minghang Li (ML), Siobhan OFarrell (SO)

AK updates

@kial has taken close look at how water fluxes are managed in ACCESS-OM2 - very difficult to get water budget to close (in annual average) but it’s possible with enough care
AK has written up a document outlining how water balance works - will share on the forum
Need to do a much more careful job in ACCESS-OM3. Need detailed documentation and demonstrations on how to close budgets
AS: Will Hobbs and colleagues have been looking at freshwater fluxes out of sea-ice - there is confusion and apparent errors. SO: not quite true - they’ve been using the wrong variables. But there is one known error in CM2.
AK: does CM2 do any freshwater flux balancing? MD: no we don’t do anything to try and correct. SO: Initial checks showed things balanced okay. That may not be true for all runs
CMIP7 meeting this Friday. Let Andrew know if there are any updates

DS updates

ACCESS-OM3 inputs have been moved to vk83 and configs updated
We’ve decided to prioritise allowing generic tracers in ACCESS-OM2 and ESM1.5 since we’d like to use the generic WOMBAT code in ESM1.6
Turning on generic tracers in ACCESS-OM3 has no performance impact when no generic tracers are configured. So only need one exe for both non-bgc and bgc runs.
Should have repro CI on our configs soon. Once set up, we will make official request to become MOM node. Create fork in ACCESS-NRI org?
Hakaseh has confirmed issue with ice-to-ocean algae and nitrate fluxes in WOMBAT in ACCESS-OM2. Simple fix that Hakaseh has offered to implement.

ML updates

Showed 025 scaling plots that show that (ocean nodes) / (ice nodes) = 9 is a good choice. Will extend to larger core counts
MO: Should decide which partition to use - consider max core count and charge. Probably want to do this sooner rather than later
MO: Really should rebuild executable for different partitions. As simple as reconcretizing and building on node on target partition

EK updates

some recent update to WW3 is meaning that we can no longer generate mod_def.ww3 - investigating
1deg crashing in Kara Sea. AK: Worth looking at whether OM3 is crashing in Kara Str. Had to apply Rayleigh damping in various locations in ACCESS-OM2 (Indonesian straits at 1°, Kara Strait at 0.25° and at 0.1°). Also should check out topog.

AS updates

Off to US for CICE meeting and 3 weeks of leave. DS will take over TWG organisation for this period.

Next meeting

5th June, 11:00AM AEST

aekiss · 5 June 2024 01:14

Here’s my attempt to summarise today’s TWG. Didn’t capture everything, so please add and correct as needed.

Date: 2024-06-05

Attendees:

Andrew Kiss (AK) @aekiss
Andy Hogg (AHogg) @AndyHoggANU
Dougie Squire (DS) @dougiesquire
Micael Oliveira (MO) @micael
Martin Dix (MD) @MartinDix
Siobhan O’Farrell (S0) @sofarrell
Harshula Jayasuriya (HJ) @harshula
Angus Gibson (AG) @angus-g

DS:

attended MOM6 dev meeting
- good that we now have a formal connection
- Work is proceeding in multiple centres that we should be more closely paying attention to
- much discussion on BGC
  - NCAR MARBL BGC model
    - ocean model agnostic
    - have been working on getting it working with MOM6
    - needed changes to NUOPC cap and a number of other places in MOM6 src
    - Mike Levy is leading this
    - PR imminent
    - changes overlap somewhat with Dougie’s generic tracer changes
    - they’ve had similar issues in handling BGC in vanishing layers, e.g. remineralising from sediment into vanished layer
- COBALT - new project to overhaul to get COBALT v3 - still a generic tracer but will need MOM6 stuff to compile - so may need to include MOM6 code to run in MOM5
- Angus told Bob we’re ready to become a MOM6 node
  - they’re happy for that to happen
  - they’re interested in tests with regional model - hopefully will happen once we have regression testing set up
  - we’re just waiting on a PR from release team to get testing working
  - AG - email from Marshall outlining process in getting on the MOM6 review team
- Discussion of sinking schemes (cc @pearseb) - MOM6 sinking scheme is simple - only sink at a single rate - would like spatially variable sinking - generic tracer sinking is handled separately but still constant rate unless you implement yourself when updating sources.

Generic tracer wombat

running in OM3
trying to get running in ESM1.6
now running generic in ACCESS-OM2
a few differences, unclear if they are worrisome, will post on forum for feedback - main difference is detritus
Dougie must have misunderstood earlier conversation about WOMBAT virtual fluxes. Need to do virtual flux corrections to surface bgc fluxes to account for using salt restoring. Awkward because not possible without extending the generic tracer API to pass salt flux. Apparently not done in BLING, COBALT etc?

Using OM2 spack dev tools - mostly works well, some annoyances

We are nearly at the point of doing science parameter test runs with 0.25° ACCESS-OM3 config

ACCESS-OM2 BGC releases aren’t restart-reproducible (2x1day differs from 1x2day run) - only BGC tracers differ.

WOMBAT restarts - 2 files, one is for most tracers, the other for sediments, handled in different code sections - might be what is breaking repro?
Problem might go away when we use generic tracers for WOMBAT in ACCESS-OM2?

OM2 release plans

release with old WOMBAT but with sea ice BGC coupling units bug fix
then release generic tracer WOMBAT without sea ice BGC coupling

minghangli · 20 June 2024 06:05

Summary of today’s TWG - I didn’t catch everything so please edit to add/correct as needed.

Date: 2024-06-19

Attendees:

Andy Hogg (AHogg) @AndyHoggANU
Andrew Kiss (AK) @aekiss
Anton Steketee (AS) @anton
Ezhil Kannadasan (EK) @ezhilsabareesh8
Dougie Squire (DS) @dougiesquire
Martin Dix (MD) @MartinDix
Minghang Li (ML) @minghangli
Siobhan O’Farrell (S0) @sofarrell

EK:

Facing crash issues before setting salinity restoring to default value (i.e., 999), hence requests discussions about whether changing to C-grid or bathomytry first?
- DS: Avoid investigating the crash causes deeply. Instead, proceed with updating the grid and bathymetry for CICE and get CICE running on a C-grid using the current grid and topography.
- Assign grid updates to EK.
- Grid and topograhy
  - AK: Extending the grid southward to include grounding lines but not cavities initially.
  - Cavities can be added later without changing the grid.
  - The cavity circulation in MOM6 currently has many bugs, so it will be included only after the fundamental issues are resolved.
  - Current edge of ice shelves is like a brick wall.
  - Micael is the contact for high-resolution topography suitable for a C-grid for MOM6.
MOM6-CICE6-WW3
- AS: CESM is working on including WW3 in their model for CMIP7.
- AS: A bug in WW3 related to ice interaction will significantly change results once fixed.
- AS: An issue with CICE, floe size distribution where its not conserving heat, freshwater and salt.
- EK: Need tests with the conservation with ice, eg how the floe size distribution reacts. An initial test with 1deg config without WW3 was done by EK, and it does not crash.
- DS suggested for the moment, pause developing WW3 configurations to focus on CMIP7 deadlines
- AH: agreed with DS. Suggested keeping in touch with Noah/E3SM/CESM but prioritise CMIP7 over WW3 for now.

ML:

Tuning MOM6 Parameters for 0.25deg Configuration
- A prior workflow was to create and merge a reference input parameter set, avoiding known incorrect parameters. After discussion, we agreed on not creating the reference parameters, instead,
- Submit PRs for definite changes and label others needing evaluation as ‘requires_evaluation’, and ‘testing’ for scientific tests.
- Link notebooks for parameters needing evaluation or testing and discuss scientific tests.
- Merge PRs, one at a time
Re-do PE layout for MOM6-CICE6
- Detailed loading studies have shown current MOM6 result in 3-4 times longer performance than OM2, inherited from CESM 1deg configuration.
- The reason behind is because some parameters significantly affect performance, altering the MOM and CICE core ratio.

AK:

CMIP7 planning
- Achieved CM3 time-stepping for 1deg by the end of June 2024.
- ESM1.6 vs. ESM3
  - AH: Optimistic ESM1.6 won’t be needed if ESM3 is ready
  - MD: ESM1.6 will still be used due to different climate sensitivities. PI controls wont be very different so people can do spin-ups with the current forcings.
  - SO: ESM1.6 is much cheaper and faster.
  - AH: current configuration testings at the moment will be using CMIP6 forcing because CMIP7 is not available yet, but it will be available by Feb 2025.
OMO conference
- AK prepared slides to inform the community.
- DS: come up with a classification in ACCESS NRI. Classify releases into alpha (preliminary) and beta (more stable) stages. Current OM3 release as alpha. To be more specific, alpha stage can be a model that you can run and may be shared with selected community members who are familiar with running models and aware of potential issues. A beta release is one that has been set up through the ACCESS NRI release framework. Hence, the current OM3 release can be considered an alpha release.
Wiki
- it is out of dates in a few places, and we dont have a properly bedded-down documentation.
- AK will update the wiki in the next few weeks, acknowledging ongoing process shifts.
- DS: Move to the release team tools sooner to avoid familiarising with soon-to-be-replaced systems.
Evaluation
- DS: Extensive parameter changes require evaluations and engagement from the research community.

DS:

CI logistics
- Currently discussing reproducibility tests, automated documentation, and cherry-picking workflows.
- Complexity arises due to overlapping responsibilities with the release team.
- Planning to clarify requirements, determine tool scope, and plan implementation at the OMO conference.
{comp}_cpl_dt in nuopc_config
- ML discovered issues under certain circumstances with {comp}_cpl_dt in nuopc_config.
- The issue only exists when stop_option = nsteps, it does calculate the driver. When using this frequency, it uses the calculation of the driver timestep as the minimum of all the {comp}_cpl_dt. So it redefines the total run duration but still use the coupling timestep set in nuopc_sequence, which results in different number of timesteps being used for each run or an error about that the timestep in the clock is not a divisor of the runDuration . This issue does not occur with other stop_option (eg., ndays, nyears etc.). Detailed discussions on this issue can be found here. Initially, we believed these parameters were not in use and set them to extreme values to clarify their non-usage.
- DS proposed setting ocn_cpl_dt to match the coupling timestep and setting other dts to a non-sensible value like 99999. This approach simplifies management to a single parameter and allows for merging the PR.

MD:

Kieran is currently away, and no progress has been made on coupling.
MD is focusing on ESM 1.5 tasks until Kieran returns.
DS questioned if the ocean grid in ESM1.5 is the same as that in ACCESS-OM2 1deg. Initially, Aidan thought they differed in how cells are stretched at the equator, but DS found them to be the same upon a quick check, pending further verification.

minghangli · 17 July 2024 23:46

Summary of today’s TWG - I didn’t catch everything so please edit to add/correct as needed.

Date: 2024-07-17

Attendees:

Adele Morrison (AM)@adele-morrison
Andy Hogg (AHogg) @AndyHoggANU
Andrew Kiss (AK) @aekiss
Angus Gibson (AG) @angus-g
Anton Steketee (AS) @anton
Ezhil Kannadasan (EK) @ezhilsabareesh8
Dougie Squire (DS) @dougiesquire
Kieren Ricardo (KR) @kieranricardo
Minghang Li (ML) @minghangli
Paul Spence (PS) @PSpence
Siobhan O’Farrell (S0) @sofarrell

DS:

Model runs with working group on Gadi
- The ZV30 project, which is for CMIP7 evaluation, is the right place for data storage, including WOMBAT runs and 0.25deg OM3 testings that ML set up. Currently there is only around 6GB out of a total of 50TB allocated to the ZV30 project, which includes both 0.25-deg OM3 and WOMBAT relevant to CMIP7.
- AM: No instructions were provided regarding the usage of OL01 storage. There might be a chance to request additional storage if it is used up.

KR:

CICE cap initially runs at the same timestep as the coupling timestep. Now, it can run multiple steps per coupling timestep, but currently has the averaging issues.
Energy conservation is reasonable, but the ice fractions appear strange, possibly due to improper averaging of ice fluxes before sending them to the ocean.
Averaging occurs in the cap before sending data across, rather than in the mediator.
DS: Changes have been made to a ~6 month old fork of CICE, well need to harmonise at some point.
AS: Current OM3 - CICE version is between 6.5 and 6.5.1, with no significant scientific changes but infrastructure improvements.

DS:

Default parameters
- Every year, MOM developers review and decide on changes to default parameter. And there are a bunch of parameters to change. Our approach of removing default parameters from MOM_input could be risky. Some parameters we rely on as defaults may change without our awareness, potentially causing unexpected results. This is another reason to automatically generate MOM_parameter_doc.* and track them with git so we can detect changes.

ML:

Created a preliminary list of parameter changes and identified several scientific tests to start:
- 1. tracer timestep: 1,2,4,6,8 baroclinic timestep
- 1. mesoscale parameterisation: (1) GM only, (2) MEKE, (3) MEKE + Geometric scaling (Hallberg) probably directly to this instead of MEKE only
- 1. Submesoscale parameterisation: (1) Fox-Kemper et al. (2010), (2) Bodner et al. (2023)
- 1. Hybrid grid: ZSTAR —> HYCOM1 (Hybrid vertical coords) ?
- 1. Lateral friction: (1) isotropic + Biharmonic, (2) via MEKE
- 1. Vertical mixing: (1) CVMix - KPP, (2) ePBL+(Langmuir turbulence?)
Starting with tracer timestep tests as they significantly improve performance. No specific preference for other parameters; all will be tested.
- Changes will affect tracer-related diagnostics, including temperature and salinity, such as,
  1. Zonal average temperature and salinity (i.e. depth/latitude maps) (Fig. 12 Kiss et al. 2020) [1993 - 2017]
  2. Time series of global average temperature, salinity and sea surface temperature. (Fig. 3 in Kiss et al. 2020), and sea surface height.
  3. Zonally integrated overturning in density / latitude space (Fig. 7 Kiss et al. 2020) [1993 - 2017]
  4. Time series of Drake Passage zonal transport. (Fig. 4 in Kiss et al. 2020)
Comments from Attendees:
- AK: emphasized the importance of having efficient and stable 1° and 0.25° configs available as soon as we can for CMIP7 development. Suggested starting with well-understood parameters based on OM2 to be used as initial config for CMIP7. We can then refine OM3, starting with refinements that may be valuable for adoption into CM3/ESM3 in time for CMIP7 deadline.
- AK/AM: suggested starting with extreme parameter values (eg for tracer timestep) to identify the negative consequences to monitor.
- AM: noted that regional models showed weird behaviors with larger tracer timesteps, not sure about global models.
- AS: suggested to ask GFDL or NCAR for similar tuning processes.
Hybrid coordinates
- SO: Hybrid coordinates are still experimental for us.
- AK: Mentioned that CESM3 will use hybrid coordinates.
- AH: Advocates for z^* if rigorous testing isn’t feasible. Plans to test AG adaptive vertical coordinate but can’t guarantee a better solution at the moment.
Parameter change process/protocols
- It is necessary to work out a process, such as automatically generating the intake catalog to make it easy for community. Aim to simplify access for external users and promote participation through customer meetings.

AK:

CMIP7 deliverables
- working underway on optimising OM3 at 1deg and 0.25deg
- working underway on tune parameter testings on 1deg and 0.25deg
- working underway on testing 1deg and 0.25deg with WOMBAT-lite
- need to determine the speed before testing 1-degree and 0.25-degree models with WOMBAT-mid.
  - DS: WOMBAT-mid is still under development.
  - SO: For the 0.25deg, it will be very expensive with BGC because we have to do pre-industrial controls, pre-emission controls and etc.
- consider performing offline BGC runs to accelerate parameter tests and spinup. Unclear how much development would be needed for offline BGC - should scope it out before committing, but would be great to have if not infeasibly difficult.

EK

Grid update
- We are now able to generate a grid that closely matches the OM2 grid.
- Changes we want to make in the new grid (copied from New grids · Issue #172 · COSIMA/access-om3 · GitHub)
  - 1. leave tripole points where they are
  - 1. leave longitudinal seam as-is
  - 1. explore putting C-grid zonal points exactly on the equator
  - 1. don’t extend 1° or 0.25° closer to the pole, since they won’t have ice cavities, but extend to grounding line to support ice cavities at 0.1° and higher, using a displaced pole (this can be done with ocean_grid_generator.py but can wait until after we’ve handed over 1° and 0.25° OM3 configs for CMIP7)
  - 1. Refine in the Antarctic by extending the Mercator region to 75°S and then extending with constant dx to the southern edge of the grid. No coarsening in the Arctic.
  - 1. Quantize double-precision to be exactly representable in single precision
  - 1. use GEBCO2023 topography
  - 1. make sure grid allows MOM6 output at half-resolution

PSpence · 18 July 2024 00:11

How can I get calendar meeting invites to the TWG meetings? They seem to have dropped off my radar.

Thank you,
Paul

dougiesquire · 18 July 2024 00:17

@PSpence, watch this topic: COSIMA TWG Announce

ezhilsabareesh8 · 13 August 2024 13:57

Summary of the last TWG meeting, please add or correct if needed

Date: 2024-08-07

Attendees:

Andrew Kiss (AK) @aekiss
Angus Gibson (AG) @angus-g
Anton Steketee (AS) @anton
Ezhil Kannadasan (EK) @ezhilsabareesh8
Dougie Squire (DS) @dougiesquire
Kieren Ricardo (KR) @kieranricardo
Minghang Li (ML) @minghangli
Siobhan O’Farrell (S0) @sofarrell
Martin Dix (MD) @MartinDix

Meeting link:

Resolved to use one consistent link for both the TWG and Ocean-sea ice meetings.
The link will be included in the meeting announcements.

EK (Topography generation update)

New topography generated using domain tools with the GEBCO 2024 dataset.
Identified 8 cell locations where partial cell thickness exceeds full cell thickness in the new topography.
OM2 topography does not have this issue.
AK: Suggested checking Panan topography and adding a condition in the tool to prevent partial cell thickness from exceeding full cell thickness.
EK: Adding cell depth calculations to the domain tools and will incorporate the condition suggested by AK.

ML: (Scientific testing)

Created a script for automated experiments that handles the input yaml file, including restart settings and initial conditions for automated experiments.
Allows multiple parameter changes per experiment.
Using payu for cloning specific commits and managing experiments.
Include branch and commit details in the yaml file for reproducibility.
Ensure consistent directory naming for perturbation experiments and capture all changes in the YAML file.
Retain the yaml file for capturing experiment details and tool versions, as it enables re-running experiments. CICE inputs needs to be adapted.

DS (Output file naming)

Decide on a file naming convention despite no perfect solution.
Add optional fields for vertical coordinates and downsampling in file names.
Ensure file names allow easy parsing of essential info.
Long names are fine if they stay manageable and clear.
Mean output of WW3 can be addressed as the need arises, current WW3 outputs are snapshots.
Label single-variable scalar files as 0D and multi-variable ones as 1D to ensure consistent processing.
Evaluate whether to adjust settings or handle downsampling in post-processing, given current limitations in level selection.

AS (CICE C-grid runs)

The C-grid shows slightly more variability in sea ice area and consistently lower ice volumes compared to the B-grid, noticeable through a 12-month running average.
Ice persists slightly longer in February and September on the B-grid, especially near coastlines.
Volume tendencies due to dynamic transport are more variable in the C-grid, consistent with the observed differences in volume between the grids.
Differences in thermodynamics are observed between the grids, especially in the Southern Hemisphere, indicating a need for further investigation.
These findings will be presented at the COSIMA and CICE Consortium meetings for additional feedback and validation.

anton · 21 August 2024 03:01

Date: 21 August 2024

Attendees:

Andrew Kiss (AK) @aekiss
Anton Steketee (AS) @anton
Ezhil Kannadasan (EK) @ezhilsabareesh8
Dougie Squire (DS) @dougiesquire
Siobhan O’Farrell (S0) @sofarrell
Minghang Li (ML) @minghangli

Minghang:

Began running tests of 025deg ryf with ACCESS-OM3. 5 year runs, and ML showed plots of Global Mean Ocean T with different parameters
Tested:
- DTBT_RESET_PERIOD
- DIABATIC_FIRST
- DT_THERM
  DT_THERM did show some differences, with larger DT_THERM showing less drift. Its hard to assess as the model hasn’t equalised yet. AK suggested looked at other diagnostics (e.g. those which assess spatial variance), and maybe trying an even bigger DT_THERM. ML will check the tracer diffusion terms in his results.

Dougie:
Updating model components.

Should we update model components, or move to access-nri forks
Should we move to the access-nri build system first ?

We will probably move to having forks directly from the main forks (CICE-consortium and MOM-ocean). Well stick with the COSIMA spack for this build.
Next step is to capture on github the steps needed to update our ACCESS-NRI forks. AS will talk to Kieran to change the CICE fork & DS will check with Angus re the MOM fork. See Update components for v0.4.0 · Issue #209 · COSIMA/access-om3 · GitHub

Ezhil:

Has been looking at jumps / gaps / discontinuities in WOA13 and WOA23. They seem to be better in WOA23 but still exist in some places. There is a smoothing step access-om2 generation of initial conditions, we need to investigate further if we still need this step. See Use WOA2023 initial condition · Issue #161 · COSIMA/access-om3 · GitHub

AK also suggest doing some comparisons runs on the equation of state. See MOM6 equation of state · Issue #20 · COSIMA/access-om3 · GitHub

aekiss · 18 September 2024 01:13

Summary from TWG meeting today - please correct and elaborate as needed.

Date: 2024-09-18

Attendees:

Adele Morrison (AM)@adele-morrison
Andy Hogg (AHogg) @AndyHoggANU
Andrew Kiss (AK) @aekiss
Anton Steketee (AS) @anton
Siobhan O’Farrell (SO) @sofarrell
Chris Bull (CB) @cbull
Helen Macdonald (HM) @helen
Martin Dix (MD) @MartinDix
Angus Gibson (AG) @angus-g
Harshula (HJ) @harshula

AM: CLIVAR OMDP meeting (on the side of CLIVAR/COMMODORE)

OMIP papers being prepared, eg
- Fox-Kemper on diagnostics and forcing strategy and protocol
- Gokhan: ocean spinup
- ocean-sea ice data request
First phase to use JRA55-do 1.5
Gokhan leading ERA5-based forcing (CFORCE) for the next phase of OMIP
- person doing work only just started
- testing by NCAR/GFDL/ Los Alamos. We’ll do 2nd stage testing (with others)
- Plan to update to ERA6 when avail
Bill Large leading revision of bulk formulae
Discussion on spinup protocol
- Moving away from 4x61-year cycles to avoid warm → cold jump from 2018 → 1958.
- Maybe repeat decade for several cycles, then IAF at end. TBC.
  - Advantage of repeat decade is inclusion of interannual variability in spinup, and closer to climatology than a single-year RYF.
  - But could make validation awkward if repeated decade is (say) 1958-1968, since both forcing and obs are poor. But later decades include more climate change signal.
Relative or absolute winds - in discussion. Eric Chassignet in favour of mixed (70% of ocean vel). But depends on how ERA5-based forcing is developed
Whether to use SSS restoring
- previously problematic, with differing parameters and issues extending into future.
- Harrison et al 2022 switch off salinity restoring and have freshwater transport between latitude bands based on CMIP models, and land storage in each band to have delays. Fixed many biases eg AMOC and T drift in spinup. Suggest using this instead of SSS in OMIP, with atm and land bulk reservoirs. Other groups unsure.
- Multiple groups to test this, including us. Details to come (Adcroft/Harrison). Invite Adcroft/Harrison to present on this at a COSIMA meeting? [NB: this has now been scheduled for the COSIMA meeting on 14 Nov 2024]
We should follow OMIP protocol to facilitate comparison with other OMIP studies.

CB: what’s the plan for spinup, given we won’t have finalised OMIP?
AH: go with something consistent with ACCESS-OM2 for comparison
AM: why match with OM2? why not match best with obs?
AH: two objectives: 1: good enough for coupled modellers (benchmark=OM2); 2: perfect the OM3 model, eg using repeat decade etc from OMIP

AH: COMMODORE meeting updates

quite technical - 5-6 engineers there, including some talks
ACCESS-OM2 bottom water was news to many attendees, and some were unaware the AABW shouldn’t be from open-ocean polynyas
emphasis on Gulf Stream separation to justify high res

AM:

lots on machine learning, eg for parameterisations; danger of us falling behind?
a lot of work on climate model emulators

AK:

Theresa Morrison on sea ice/iceberg -ocean coupling without levitation by coupling at barotropic timestep
Carolin Mehlmann on hybrid sea ice / iceberg melange

AM: what are our options for icebergs? Can CICE6 do it?
AS: unclear what is being done. Was work ~10yr ago with icebergs in CICE but not coupled to ocean

AM: What about distributing iceberg flux in OM3? We should spread according to a prescribed pattern rather than at coast. See issue.

Martin/Siobhan/Chris: at the moment we have HADCM3 seasonal climatology from coupled run
Chris: has used interannually-varying iceberg distribution
Adele: any spread is better than nothing
Andes : about 50% at the coast, 50% spead by distribution
Anton: Who to take this on?

AS: @minghangli’s tracer timestep investigation:

seem to be settling on 3hr irrespective of resolution
to be tested at 0.25°

AS: @ezhilsabareesh8 progressing well with new grid

error of too many CFL truncations
log files not working?

NCMAS

NCMAS applications are likely open soon, although they appear delayed. We need numbers scaling performance for both OM2 and OM3. Its hard to provide solid OM3 numbers at this point.
AM: need good numbers to include OM3 in proposal. Last year’s application was criticised for lack of OM3 detail
few proposals in this round are likely to need OM3

Andy: OM3 evaluation

had a lot of test cases lined up
timing, timestepping etc needed to be done first - now basically done
revisit test plans, or proceed with plans?
strategy of something decent for CM3, then refine
run and analyse all, or involve community in analysis and/or runs?

Anton: Best case is with tracer time-step and new grids, OM3 will be stable and efficient enough for CM3 without further investigation. C-grid cice testing to be done still at 0.25° - expect to be ok.

Andrew: for refinements after an intial version of CM3, we want to be able to keep those in sync

Martin: @kieranricardo has custom mediator and different build process, using OM3 object file as library. But OM3-CM3 upgrade path fairly frictionless so can adopt OM3 changes easily.

Adele: for model evaulation we need to get COSIMA to agree on a priority list of metrics and acceptable values (eg T bias, AABW formation rate, etc), and ideally to write assessment scripts

Andy: Cookbook or ESMvaltool?
Anton: going with Cookbook style - lots of overhead with ESMvaltool
Andrew: want to avoid additional barriers to engagement - already have MOM6, etc
Andy: need a plan and timeline once we have Ezhil’s version stable and Minghang’s timings

COSIMA leadership meeting at Australian Antarctic Research Conference, Hobart 19-22 Nov

intro movie
ethics
attribution/credit

AK: For a later discussion: post-CMIP7 priorities: high resolution? BGC? WW3? DOCN? @lgbennetts is keen on CICE-WW3-DOCN but needs excessive wave attenuation to be resolved (perhaps with this bugfix? need to wait until available in ESCOMP before we test it in OM3)

anton · 2 October 2024 03:00

Summary from TWG today - please correct and elaborate as needed

Date: 2024-10-02

Attendees:

Chris Bull (chair)
Andrew Kiss
Siobhan O’Farrell
Ezhil Kannadasan
Minghang Li
Martin Dix
Anton Steketee

Truncation files in MOM6 (Ezhil)

Truncation log missing from OM3 output, file created but empty and therefore getting swept from payu
Writing to the file relies on FMS2 - Angus suspects there is an issue with mpi parallel write to an ascii file which was implemented recently in FMS2
Ezhil will confirm if there is a bug with mpp_multi & also try mpi io type romio

Tracer timestep in MOM6 (ML)

With 3 hour tracer timestep with 0.25 degree grid, mean global temp rises slower but salinity is consistent compared to 1350s timestep. Surface temperature is warmer with 3 hour - increasing heat flux out of the ocean. ML will do depth vs latitude plot (averaged over all lon)
No notable difference in temperature with diabatic_first true/false for dt_therm = 1350s
GFDL are using 2 / 3 hours for this parameter. ML will post on the NCAR or GFDL forums

Runoff (AK)

Andrew has found that lots of runoff is missing from OM3
We suspect than some of it is getting remapped to land, rather than actual ocean cells
We will confirm with CESM about how they do this (possibly remapping files)

OM2 Restarts (CB)

Some / all old restart files are not compatible with the access-nri builds

cbull · 16 October 2024 05:42

Summary from TWG meeting today (@cbull) - please correct and elaborate as needed.

Date: 2024-10-16

Attendees:

Andy Hogg (AH) @AndyHoggANU
Andrew Kiss (AK) @aekiss
Anton Steketee @anton
Siobhan O’Farrell (SO) @sofarrell
Chris Bull @cbull
Helen Macdonald (HM) @helen
Martin Dix (MD) @MartinDix
Angus Gibson @angus-g
· Minghang Li @minghangli
· Ezhilsabareesh Kannadasan @ezhilsabareesh8
· Micael Oliveira @micael

Chair: Ezhilsabareesh Kannadasan
Minutes: Chris Bull

The agenda:

C-grid plots (Anton);
Progress update on Tracer Timestep for OM3 (Minghang);
Topography edits to preserve marginal seas (Ezhil);
Ocean Team workshop w/c Nov 11th (Chris);
Spackification of MOM6 in COSIMA/ACCESS-om3 strategy (Chris);
Outcome of the test with spack vs “build.sh” built om3 (Minghang / Ezhil ?)
CICE branch used for CM2 for ESM 1.6 (Anton / Siobhan O’Farrell if present – suggest a fork)

Anton update: SSH errors when running 025 sea ice-runs on c-grid. Have been getting further truncation errors. Ocean time step of 1350. AH: Consider dropping the baroclinic timestep? Anton: would like to re-run with truncation diagnostics turned on.

Minghang: showed spatial plots of where the truncation errors have been occurring. Comparison of fixing tracer timestep changing baroclinic timestep of 1350, 1200 and 1080, getting truncation errors in the same place as when the tracer timestep was changed (AH qn: using old bathymetry etc). AK: In some versions of om2 we added friction (Rayleigh drag) at that point to fix issues. AH/AK: thought there’s a cartopy plotting issue with the tripolar grid at that point (not actually a nan). AH/CB: happy to use 1080. Minghang: what about the crashes? CB: don’t know what affect the new changes will have. AH/CB: suggest Minghang re-tries with all the updates (new bathymetry, 75 levels etc). AK: can consider the manual changes (friction) that we did in om2 (is that possible in mom6?) if/when the updates don’t fix all the problems.

Ezhil: gave summary of how topographic hand edits were done previously (Andrew has just made a post about it). With the current update in GEBCO quite a few marginal seas have reappeared. AK: looks like your new bathymetry has too much land? AK qn: what’s the criteria for determining land cells? AH: Hoping to do less hand edits than last time? Because the C-grid should be more robust? AK: depends on the reason for the hand edit (see post, focuses on the many om2 0.25 changes); largely depends on whether the issue is related to width or depth (narrow width being less problematic on a c-grid). Ezhil then showed his script that he’s using to edit topography. AK: curious that topogtools with “—fraction 0.5” gives a different result. AK: Last time, north of 60 degrees the landmask was pre-existing (south of 60S it was based on the topo’) whereas in this case it’s being generated off the topography everywhere. AH: would be helpful to understand why we are getting land points for example in the middle of the Baltic sea. Micael: in om2 some edits were done to the mask (not the topography) so difficult when the mask was given to us; AK: imposing the mask then creates additional edits so they’re hard to understand. AK: suggest that Ezhil and him work on the problem together, suspect working on the land mask is the first task to tackle. AH: would be great to have the process by which the land mask and bathymetry are created as well documented, it will be used by other modelling components and applications (resolutions, paleo, regional configs etc). AK: om2 workflow for creating these files was tried this morning and there’s a few environment problems.

Ezhil: did a check of using build.sh and spack build, different answers? cfl truncations were different doing a quick ncdump check. Micael: suspects release team has turned off all architecture dependent optimisations, could be the reason for the differences. At a guess, one will have “-m arch” and the other one will not, could be a few % differences in performance. Suggests: following up with the release team to provide capacity for this (Chris will follow up). Minghang: found larger differences (although had code changes) but AH-AK think they could be in different places, chaos influences etc make the actual number comparisons less important. Chris: given wombat-dev workflow uses a Spack build process, suggest the whole team move to using that build process, or at least stop making comparisons between different build processes.

Chris: queried people’s availability for w/c Nov 11th for ocean team workshop in Canberra. AK, AH are available. Angus as well from Wednesday onwards.

Chris: proposed updating the fork of MOM6 in cosima/ACCESS-om3 builds to use the ACCESS-NRI/mom6 repo (which points to mom-ocean). This is useful for MOM6 node work. Provided a test suggesting unchanged answers is successful, all were ok to proceed… Micael explained why they’d chosen to use the NCAR fork historically in cosima.

Anton: do we need our own fork of FMS? Chris: I think it would be a good idea given that many modelling components will use it and Ezhil has got a bux fix for example that everyone will want. Chris: wait to see how Chris’ spack-isation of MOM goes and if that works, we can then do FMS.

Chris: now that cice cm code is on GH, release team are looking at running some compile test using the esm1.6 codebase. SO: would it expect to compile and not run because the coupler needs work which is what Dave Bi is working on. SO is working on output diagnostics. SO asked: Is the GH connected to anything upstream. Anton: no it’s just a code copy that was put online.

ezhilsabareesh8 · 30 October 2024 23:42

Summary from TWG meeting today - please correct and add as needed.

Date: 2024-10-30

Attendees:

Andrew Kiss (AK) @aekiss
Anton Steketee @anton
Siobhan O’Farrell (SO) @sofarrell
Chris Bull @cbull
Helen Macdonald (HM) @helen
Minghang Li @minghangli
Ezhilsabareesh Kannadasan @ezhilsabareesh8
Micael Oliveira @micael

Chair: Minghang Li
Minutes: Ezhilsabareesh Kannadasan

The agenda:

ACCESS-NRI ocean survey: steering future development for users (Chris)
Ocean team workshop (12-15th November, 2024)
Discussion about topography in general (Ezhil)
Updates on scaling performance for OM2-025 configuration (Minghang)

ACCESS-NRI ocean team workshop and survey:

Chris highlighted the ocean team workshop scheduled for November in Canberra, where discussions will center on model development aligned with community needs. He encouraged COSIMA community participation in the survey to gather input on scientific and model priorities Survey link. Chris is also keen to get survey feedback from anyone in the wider community that has an interest in ACCESS-NRI ocean models, Siobhan O’Farrell offered to share the survey amongst some wave people that might be interested. The workshop agenda will be posted in the Hive Forum, and community members are welcome to attend sessions relevant to their work.

Discussion about topography (Ezhil):

EK provided an update on the recent changes to the topography generation workflow. The updates included removing T-cells of smaller size to better resemble the OM2 topography and to address model crashes occurring near the tripole.
AK modified the deseas algorithm to make it compatible with the C-grid.
EK generated topographies for two fill fractions, concluding that a fill fraction of 0.5 appears optimal for avoiding tiny cells. Hand edits are planned to incorporate the Black Sea and deepen critical straits.
AK: A test run after resolving the Black Sea inclusion was recommended as a next step to identify any further issues, particularly with surface salinity restoring.

Updates on scaling performance for OM2-025 configuration:

Minghang shared recent scaling performance results for the OM2-025 configuration for NCMAS. Additionally, Minghang performed scaling tests on the MOM6-CICE6-WW3 coupled model, noting that the inclusion of WW3 significantly increases computational costs.

Anton’s Updates:

Working with Kieran on CM3 coupling. Kirean is focusing on a river spreading scheme for improved runoff distribution.
CM3 at one-degree resolution, with plans to shift to quarter-degree.
Anton is also working on to read the MOM supergrid directly into CICE, ensuring exact cell area and angle alignment and removing the need for grid preprocessing in CICE.

Topic		Replies	Views
COSIMA TWG Meeting Minutes 2023 TWG meeting , twg , notes , minutes	15	1741	11 December 2023
COSIMA TWG Meeting Minutes 2025 TWG cosima , meeting , twg , ocean , notes , minutes	15	375	9 July 2025
COSIMA TWG Meeting Minutes 2022 TWG meeting , twg , notes , minutes	4	368	10 November 2022
COSIMA Working Group Meeting Minutes Working Group meeting , notes	25	2446	2 July 2025
Ocean Modelling and Observations Workshop 2024 Workshops python , cosima , workshop , cosima-workshop-2024	26	978	23 August 2024

COSIMA TWG Meeting Minutes 2024

1deg MOM6-CICE6 scaling

OM3 releases and component updates

Where to open issues and replicating updates across configs

Reproducibility CI

Documentation

Next meeting

Performance scaling

Model evaluation

Input directory structure

Namelist disussion: diabatic_first

Restart issue

Next meeting

Offline BGC

ACCESS-OM3 component update

MOM6 version choice and MOM6 node

Profiling & benchmarking

Documentation

Licensing

Next meeting

Related topics