COSIMA TWG Meeting Minutes 2024

Summary from TWG meeting today - please correct and elaborate as needed.

Date: 2024-01-17

Attendees:

MO

  • profiling: runs hanging when CICE has >76 cores, apparently when reading from the mesh, even when IO is serial. Unlikely to use this many cores at 1 deg so capped profiling to max 76 CICE cores.
  • will probably want to run MOM & CICE concurrently rather than sequentially, since MOM6 scales better than CICE6, and therefore give more cores to MOM6 than CICE6 - as was done in ACCESS-OM2.
  • expect to have scaling plots to show next week
  • AH: whether to run atm sequentially or concurrently depends on relative resolution of ocean and atm

EK: WW3

  • suggestions from Stefan (BOM) - eg turn on ice scattering and dissipation in wave model
  • have 5 dissipation models and 2 scattering models - trying these out
  • implications on what fields are passed from CICE → WW3
  • but floe size not active in CICE6 yet - tried turning on - requires passing wave spectrum to CICE - this is happening but unsure what spectral types to transfer
  • SO: will chat to @NoahDay & @lgbennetts about this; also there was a relevant presentation from Cecilia Bitz at MOSSI2022 [but not on the MOSSI youtube channel] re. dissipation option 4; definitely want to put spectrum into CICE6 but unclear whether full spectrum is needed
  • EK: has confirmed the full spectrum (25 components) is getting through to CICE
  • SO: will confirm if @NoahDay used floe size bins the same as current CICE defaults
  • EK: for testing has set categories to 12 based on Noahs experience
  • enabled floe size distribution output
  • SO: not all WW3 options require FSD, some can work with default option (all floes the same notional size), but CICE would not interact with waves without FSD
  • EK can also read in a netcdf FSD file
  • SO: can generate internal spectrum within CICE
  • Stefan also suggested disabling interpolation in time as not suitable with coupling; also not sure about Langmuir turbulence and non-breaking-wave induced mixing; currently turned on; also 11 parameters in MOM6 for that, following Shuo Li’s config
  • AH: non-breaking wave mixing is controversial - unclear whether to use this
  • SO: would also need to look into how it’s hooked into turbulence closure in MOM - check what the US teams are doing - will find contacts to follow up

HJ:

  • Spack v0.21 failed to compile parallelio with the netcdf version we are using - is it ok to pin to the old working version or should we use a newer one?
  • MO: if versioning is truly semantic it should be fine
  • AS: CICE6 says 2.5 & 2.6 are supported so 2.5.10 should be fine for CICE6

AS:

  • openmpi symlinks bug raised last year - a patch now done but 4.1.7 won’t be released until late Q1 2024
  • patch in mom6-cice6 config to turn on netcdf4 and parallel read/write - seems to improve performance; will be more significant with more cores
  • when updated can also turn on parallelio for other components
  • working with CICE people to improve netcdf error handling
  • looking into wave-ice interaction with EK
  • merging updates into ww3 configs
  • might be a bug in ww3 history output
  • MO: how long will it take NCI to install updated openmpi
  • AS: will request from NCI helpdesk when released

DS:

  • added coupling diagnostics - revealed unexpected things but sorted out - just misinterpretations
  • coupling side of wombat ready to have somebody look at - will also ask NCAR people to look at it
  • started porting wombat to generic tracers
  • going on 3wks leave at start of Feb
  • AH: CESM MOM6 workshop coming up - won’t be sending anyone in person but would be good to have a presentation to update people
  • AK: overlaps with AMOS - presentations could be similar - I’ll share slides and welcome any input
  • porting wombat to generic tracers - need to decide how to handle wombat versioning - eg @pearseb has updated to have ~25 tracers - will initially port the old version before updating - the old version system used include files but this isn’t working anymore - versioning a bit confusing
  • proposal: 1st port old wombat to generic tracers, tag it, then update to Pearse’s version
  • merging versions could be difficult - need to arrange a meeting
  • want to share with NCAR before going on leave

CMIP timeline

  • AH: will be a meeting at AMOS (lunchtime Tues) re. fast track v2 CMIP which might shed light on timelines
  • AH: running candidate models ready for testing ideally by end of June 2024 but maybe as late as Dec
  • AS: need to clarify what we are aiming at with OM3
  • DS: high-level planning meeting sometime soon would be useful
  • AH: broader meeting than TWG - include a few community folks
  • AH: would be good to have an overview of what needs doing and a straw-man timeline
  • MO, AS: good to have a task list and timeline eg gantt chart eg on github — see this topic to discuss

MO: creation of 1 or 2 repos for helper scripts/utils

  • already have one (om3utils) but want it to be a tested, documented python package
  • also need a place to dump scripts eg for reproducibility
  • propose having both, and moving things between them as needed
  • DS: can become unclear where to find things unless scope of each repo is clearly defined
  • MO: want package to be useful and reuseable
  • DS: don’t all scripts already meet that?
  • MO: maybe not
  • DS: a dump repo would need still need structure (subdirs), docs (READMEs) and code review as they need to be working, and should link git commit in metadata of output
  • MO: agreed - call it om3-scripts
  • DS: include environment.yaml to record the conda env
  • DS: at some point we may want to bundle scripts into a python package
  • MO: will move om3-utils repo to COSIMA org - should be reviewed

Next meeting
1-2pm Wed 31 Jan to avoid AMOS

Summary from TWG - please correct and elaborate as needed.

Date: 2024-01-31

Attendees:

The meeting was entirely dedicated to discussing the ACCESS-OM3 development
timeline and OM3 priorities for contributing to ACCESS-CM3 and ACCESS-ESM3 in
time for CMIP7.

Current plans for OM3 configurations

AK shared some slides related to model development workflow and planned OM3
configurations:

AH:

  • OM3 is currently in the “Model Timesteps” stage (see slides above).
  • OM3 1deg configuration at the “Preliminary Optimization” stage.
  • CM3 is not there yet and some work remains to be done.

MD: OM3 configurations are still based on the corresponding OM2 ones. Need to
change the grid type to take advantage of C-grid features.

AH:

  • There are several planned OM3 configurations, plus some configurations that might be develop or not.
  • A MOM6-CICE6 configuration will be used for CM3
  • A MOM6-CICE6-WOMBAT configuration will be used for ESM3
  • Configuration for CMIP will be 0.25deg MOM6-CICE6-WOMBAT

Scientific options and configuration development

AM: Will any of these configurations include ice-shelf cavities? Maybe a new
line is needed in the table?

SO: Ice-shelf cavities require high resolution. 0.25deg might not be enough.

MD: What scientific options do we want to use for CMIP?

AS: C-grid for sure. Too early to add landfast ice.

AH: Need to distinguish:

  • scientific options for configurations that we want to facilitate that are of interest for the research community (e.g. land-fast ice, waves)
  • scientific options to explore for CMIP (e.g. c-grid)

AH: Regarding MOM6, we might explore different vertical coordinates (isopycnal vs Z*)
AM:

  • We could do like for the MOM6-SIS2 global configuration: run with different coordinates and compare results.
  • We could also compare with NCAR/GFDL configurations. Good to do during the optimization step of the configuration development.

AH: At which phase to do that?
SO: At preliminary optimization/evaluation.

AM: At which resolution? Is 0.25deg worth doing? Scientific community is more interested in 0.1deg.
AH:

  • The question is rather in which order we do the work.
  • We can probably reuse a lot of the work done for 0.25deg for the 0.1deg configurations.

DS: There are lots of options to test for WOMBAT.

AS: How quickly do we want to have the more experimental ice-related options?
AM: We want them for the 0.1deg configuration for the scientific community.

MD: It’s probably not worth putting too much effort into the 1deg configurations.
SI: Keep it as a fast option, but not for CMIP 7.
AK: 1deg could be very useful for tests and continuous integration.

OM3 CMIP 7 Timeline
AH: Timeline? What to prioritize: 0.25deg MOM6-CICE6 → 0.25deg MOM6-CICE6-WOMBAT

AK: Cheap configuration needed for testing when updating codebase.

DS: What work is required to go from one resolution to another?

AH/AK: For OM2, most work went into updating the topography. Then remapping
weights for OASIS exchange grids and tuning.

Consensus: 1deg MOM6-CICE6 → 0.25deg MOM6-CICE6 → 0.25deg MOM6-CICE6-WOMBAT

AH: What priority for WW3 configurations?
AK: only useful for scientific community interested in waves. We can probably keep these configurations in sync with the others.
AH: Need to ask community about interest.

AH: Proposal to have CMIP7 configurations ready for full evaluation by
mid-year.

AH: Will WOMBAT be okay with this?
AM: Will need WOMBAT by mid-year or later?
MD/AH: Might come a bit later.

DS: Do we need to update the grids and topography?
AH: Yes.
MO: We have the tools and the workflow. Now just need to do it.

AK: Still missing C-grid in CICE.
AS: This looks doable
AK: There are some known drawbacks of using C-grid. Need to be aware of it.
AS: C-grid in CICE is, as a feature, considered finished.
AK: But not all features are available for C-grid.
AK: Issue with mediator/coupler as it uses A-grids internally
KR: With CMEPS, all fields need to be on the same grid.

Task assignment
Minghan: 0.25deg configuration
Micael: topography and grids, scaling and performance optimization
Anton: CICE
Dougie: WOMBAT
Ehzil: ?

Project management
AK: We will set up a project dedicated to CMIP7 on the COSIMA Github
organization. All members should try to update existing issues and add missing
issues.

Next meeting
Back to usual schedule: 11am Wed 14 Feb.

1 Like

COSIMA TWG

Date: 2024-02-14

Attendees: Andrew Kiss, Anton Steketee, Micael Oliveira, Minghang Li, Martin Dix, Harshula, Angus, Ezhil Kannadasan, Aidan (Apologies Dougie, AH)

0.25 Degree Config:


AK has started issue #101 in the ACCESS-OM3 repository: develop MOM6-CICE6 025deg_jra55do_ryf based around the MOM6-CICE6 1 degree configuration 1deg_jra55do_ryf and the ACCESS-OM2 0.25° configuration.

Project Board:


AS: We could add analysis notebooks + some regular runs on 1 deg configuration

AK: For OM2 we had a figures directory in the ACCESS-OM2 report repo with the metrics of interest, aimed to be a living approach but one that is mutually acceptable. Probably need to start a new repo for that. We could also include performance metrics.

  • One notebook for each figure / very simple metrics can work well.

  • Start with manually generated intake catalogue - for om3 analysis.

Aidan: This could be an application for the MED Team (Mike Tetley’s) live diagnostics tool or jupyter intake scripts. We could add a hook from payu to generate an intake catalogue + load this into the live diagnostics. Initial approach: use Intake, manually generate intake catalog; then use payu auto-catalog when available

  • Action for Anton : Follow up with Romain + Mike + Aidan.

Processor Layout:


MO: Writing scripts / tools to analyse parallel performance (currently in a branch of the om3-utils git repo). Using the trace generated by ESMF. Which includes all the things that go through the driver & mediator. The profiling separates the timing into code ‘regions’ (e.g. for coupling, timesteps, etc)

Adding cores mostly very poor efficiency. MO proposes shifting to running ice+atm+ocean simultaneously, with just enough cores that ice + atm run faster than ocean.

MOM6 is approx 5x slower than in access-om2 per model year. This is with a different number of time steps and MOM6 is fairly different compared to MOM5. MOM6 is / would be dominating the timing if components run simultaneously, so looking at the internal timing within MOM could be the next step. Otherwise we could go straight to a 0.25 degree config, rather than work too much on optimising at 1 degree.
MD asked whether it was related to nuopc, but we don’t think so. ESMF profiling specifically isolates MOM6 timestepping as the culprit.

Angus suggested investigating compiler flags, or more comprehensive profiling in MOM.

Aidan suggested investigating the IO but MO has checked that the profiling IO is mostly outside the MOM code.

Spack:


Harshula:

  • Transitioning to new spack versions (v0.20 to v0.21).
  • For OM2: Downgrading to PIO 2.5.2 from 2.5.10 (forum post). Plus removing nci-openmpi psuedo package, and using the openmpi system build directly. At this point this only impacts the ACCESS-NRI OM2 build. The ACCESS-NRI & COSIMA build give identical data outputs.

Repro-CI


Aidan: Adding repro-CI for access-om2 to test compilation and bitwise reproducability for output / results from OM2 runs. This will allow comparisons between versions + code changes etc and is something we will need to investigate for OM3.

Waves


Andrew Kiss - spoke to Alberto Meucci from Uni Melb at AMOS who is interested in our wave parameter choices and output. Meeting to be organised to gather input from wave modellers on our WW3 parameter choices.

Next Meeting: March 6