COSIMA TWG Meeting Minutes 2023

Notes from this week’s COSIMA TWG. Feel free to add whatever I’ve missed or to modify anything that I got wrong.

Date : 2023-10-11
Attendees : Micael Oliveira @micael, Andrew Kiss @aekiss, Aidan Heerdegen @Aidan, Dougie Squire @dougiesquire, Angus Gibson @angus-g, Ezhil Kannadasan @ezhilsabareesh8, Siobhan O’Farrell @sofarrell, Anton Steketee @anton , Harshula Jayasuriya @harshula

COSIMA Spack instance and ACCESS-OM3 deployment plans

  • We currently have 2 Spack instances at /g/data/ik11/spack, using different versions of Spack
  • We do not delete old instances in case it’s in use.
  • There’s a GitHub repo with all the configuration.
  • Includes a script to automatically create a Python virtual environment.
  • Some Spack environments are for consumption, some (unstable) are used as a development sandbox, e.g. updating OM3 dependencies.
  • Still unclear what is the best way to install the same Spack env twice with different architectures - currently can do it by logging in to different architecture nodes and recompiling, but the Spack lock file, which are kept under revision control for reproducibility, needs to be changed when changing architecture.
  • Executables are currently deployed manually to /g/data/ik11/inputs/access-om3/bin.
  • Plan is to also use the spack environments to deploy the executables:
    • module gets created automatically
    • ensures reproducibility (currently we don’t keep information about how the executable was compiled).
    • currently we don’t have a simple way to distinguish exectuables compiled for different architectures.
  • Need to add OM3 package to Spack
  • Current build system: each model component is built as a library, then all of them are linked to build the OM3 executable.

AH/HR:
• would like OM3 in spack
• would like to build each model component separately in Spack.
MO:
• we can start by changing the OM3 CMake to install the component libraries.
AH:
• may be able to deploy OM3 from github - more seamless
• need to think about how this interacts with payu configs, so config uses correct deployment
MO:
• simplest way is to use modules
AH:
• but currently payu doesn’t look at paths for exes - needs to be hard-coded - may be better to modify payu to support this.
SO:
• need to think about how the build system will work with Rose/Cylc

ACCESS-OM3 update

  • Finished setting up 1deg configs using OM2-based settings, supergrid and topog, C-grid MOM6 and B-grid CICE6 for testing purposes (will improve these later, including C-grid CICE)

    • basically done for MOM6-CICE6
    • now need to do MOM6-CICE6-WW3
  • MOM6-CICE6-WW3 configurations:

EK:
• Added compilation of WW3 helper exes to build script - see EK below
• current MOM6-CICE6-WW3 uses predefined inputs, parameters
• need to make this tripolar, matching other model components
• requires helper exes
• have done this following Shuo Li’s config
• this creates input binary files and restart file
• now running with same grid for all model components
• now moving to look at WW3 switches (much of the functionality is set at compile time via switches)
• using a modification of version from Denisse Worthen
• we can now generate our own parameters, change grid, etc
• MOM6-CICE6-WW3 repo now has parameters based on Shuo LI
• getting help from Stefan Zieger (BoM)
SO:
• also need to look at / set up parameters to suit ice-ocean interaction
• can also see what parameters UKMO have used
AH:
• take the opportunity to move tripoles? away from Hudson Bay and Gulf of Ob

  • Understanding how profiling works in CESM was not trivial (upstream is in a state of flux)
    • it is now working via ESMF - provides nice output and includes visualisation tool.
    • Includes operations done via ESMF, ie everything in run sequence.
    • Some models (CICE6) have some profiling that can be reported to ESMF.
    • MOM6 profiling is reported via FMS not ESMF.
    • Will now test number of cores per component. Big parameter space. Aim for good-enough performance, and then optimise after we have finalised configuration.

SO:
• WW3 can cost as much as the ocean - but depends on resolution and timestepping
AH:
• but both ocean and WW3 are cheap compared to atmosphere
MO:
• can run ocean and waves in parallel or one after the other, giving different options for core counts
• need to re-do optimisation work with ww3
SO:
• were using 1hr wave and 20min ocean timestep

  • More work by DS porting WOMBAT
    • after much effort, can now run OM3 with BLING without crashing (not running correctly, but not crashing - not using coupler) - needed to add things to NUOPC cap
    • some of benefits of generic tracers are disappearing - assumes using FMS, so it will be more work than anticipated to get WOMBAT running the same way in OM2 and OM3.
    • generic tracers are tracers that run in both MOM5 and MOM6, but this is really only the case if using FMS coupler. NUOPC requires additional changes. We were hoping to be able use generic tracers to use any of the other BGC packages (BLING, COBALT) as easily as WOMBAT, but this is not likely to be easy with ESMF coupling and may also require compile-time tweaks
    • block after block after block
    • would need to hack BLING code to avoid calculating fluxes in BLING
    • NUOPC cap ignores some of the fields that need to be turned on, eg gas fluxes
    • CESM uses MARBL but this is implemented in a completely different way with a driver layer. Can run with multiple ocean models. Have made contact with software engineer; awaiting details.

AS/SO: ice testing
• test both stable and unstable
• issues with scripts breaking

Other topics

AH: payu update
• Jo is close to merging date-based restart pruning - includes checks that we’re not pruning too much (can pass flags to override)
• then will work on remote syncing - will be a hook for post-processing prior to remote sync - will be options to remove locals
• then uuid for provenance
AK:
• may want to also clone/update run repo to destination when syncing to preserve copy of run log with the data - see https://github.com/COSIMA/01deg_jra55_iaf/blob/01deg_jra55v140_iaf_cycle4_jra55v150_extension/sync_data.sh#L153-L157
• may also want to exclude some files from sync, e.g. uncollated files and *-DELETE etc from post-processing https://github.com/COSIMA/01deg_jra55_iaf/blob/01deg_jra55v140_iaf_cycle4_jra55v150_extension/sync_data.sh#L21

AK: non-reproducible run
⚠️ Inconsistent ocean and sea ice in final 7.5yr of 0.1° IAF cycle 4
AH:
• payu can do reproducible runs using manifests but currently only checks inputs, not outputs
• hard to see how to ensure repro without running everything twice
• plan to re-run released experiments to detect repro problems eg with NCI updates
MO:
• could be changes on gadi, eg security patches to kernel
AH:
• also very little NCI control on firmware, which hardware vendors update
• need a mechanism to version data and withdraw bad data

1 Like