Technical requirements for MOM6 node testing

@angus-g has put out a call for MOM6 model configurations to include in a testing suite when there is an Australian MOM6 “node”

This is a related discussion about the technical aspects of any such testing.

Here’s a couple of relevant papers discussing reproducibility and testing. The first one defines 4 categories of reproducibility, and statistical tests to automate categorising the non-bit-for-bit cases

Changes, additions and updates to CICE fall into four categories: (I) BFB [bit-for-bit] with no further assessment required; (II) non-BFB but unlikely to be climate changing; (III) non-BFB and climate changing; and (IV) a new model configuration option requiring separate scientific assessment. This section describes the automated methods used to flag the first three categories.

1 Like

Are payu configurations desirable? Preferred? Required?

How do other sites run their testing? Do they version control their inputs? If so, how?

Is this even the right topic to discuss this? Happy to move to another topic if not.

I suppose most of the preexisting configurations around would probably be using payu, which is why I suggested the config.yaml (also gives an idea of resource requirements). But this probably ends up being a question for the technical implementation of the actual running of the tests. There’d probably be a little bit of modification required to give a testing-suitable run anyway. I think a lower barrier to entry by not requiring payu is fine?

I know that GFDL runs their tests through a pipeline on an internal Gitlab instance. I wouldn’t be surprised if there are a range of solutions from manual running, to Makefiles handed down from a supreme being (some of the developers use this for their own tests), to modern pipelines. I can try to dig around for a bit more info there.

I think the control inputs are often version controlled. GFDL has MOM6-examples, ESMG has an equivalent with their configurations, etc. There are probably private configurations, but they’d be within version control on the inside of the firewall. As for other (binary) inputs, I’m not sure! That is probably an issue we’ll have to think about too, particularly for full-chain reproducibility and provenance.

Sure :slight_smile: But we might want to spin out a discussion on the technical implementation (running tests, validating tests, how to organise configurations, etc.). Ideally it can all be authoritative, so we don’t get a desync between what we’re testing and what’s actually being run. But also the testing is only as valuable as the tests capturing codepaths in the model that people are actually interested in.

I was worried I had derailed your topic @angus-g, so I’ve moved the discussion to this topic. Hope you don’t mind being scooped up and moved to @aekiss, but your post seemed to fit here quite well. I can move it back if you want me to.

1 Like

Sorry, I didn’t notice the config.yaml reference.

Yes I don’t think there is a problem with a lower barrier of entry, but if payu is the preferred way to go (and I think it is), then non-payu configs will have to be converted to run with payu in any case.

I’m wondering aloud about a few things:

  • @MartinDix was enquiring a while ago about ways to version inputs for the rose+cylc experiments. It got me thinking about IPFS
  • Are there modifications we might usefully make to payu to facilitate running test cases programatically like this? The ACCESS-OM2 testing writes to config.yaml files. We could make it more seamless than that I reckon.

True. The intel compiler can also generate codecov data, which might be worthwhile thinking about to quantify coverage.

GFDL actually uses codecov on their fork, e.g.: Fix a bug in the OMP directive for plume_flux by Hallberg-NOAA · Pull Request #427 · NOAA-GFDL/MOM6 · GitHub. Although that applies to the smaller regression tests that are run through GitHub Actions. You can also see an example of the Gitlab pipeline link in that PR.

1 Like