Hi all, this is a follow-up from the discussion at last week’s COSIMA Meeting for the formation of a Major Fork of MOM6 based in Australia.
We need you to submit a test configuration that represents how you use MOM6, so that we can maintain our own development fork focused on the needs of the community.
Since the authoritative MOM6 repository was moved from NOAA-GFDL to live under the mom-ocean organisation, the development workflow has changed slightly. The main repository does not accept individual changes, which must instead be submitted to a Major Fork, e.g. NOAA-GFDL, NCAR, or soon the Australian fork. The major forks operate independently of one another, and are free to maintain whatever features of MOM6 they would like.
At some point, perhaps determined by a time period or sufficient divergence from the main repository, a fork may propose a pull request directly on the main repository. For this pull request to be accepted, one nominated delegate from each fork must cast an approving vote. This vote represents that the fork is happy that the pull request:
- does not break any configurations (functional testing);
- does not change any results without a reason or a runtime flag to regain the previous behaviour (regression testing);
- must satisfy the MOM6 coding rules (code review).
While there would be a slight maintenance overhead, there are some benefits to having our own major fork. We can decouple our development from the other forks (to date we have submitted code changes through GFDL), which means we can work on features without worrying about unrelated changes coming from elsewhere. Perhaps more importantly, it gives us a say about the testing of code. By the first two requirements in the previous section, we can verify that our configurations don’t get broken or have answers changed.
By setting up the infrastructure and gathering configurations in order to be a MOM6 major fork, we’ll also be formalising some of our own testing. Hopefully this will lead to more structure in the configurations we do use, but also in terms of compiling model executables, etc.
This is where I need your support! ACCESS-NRI is able to provide some of the technical requirements to becoming a major fork, namely testing infrastructure. However, it’s up to the scientific community of COSIMA who rely on the model results to come up with the tests that matter. There are a couple of reasons for having different test cases that we have stewardship over:
- it’s likely that the combination of
MOM_inputparameters is somewhat unique, so we can ensure that they remain compatible with one another (criterion one above);
- we should be able to ensure bitwise compatibility of the results, so the configuration can be relied on for stability (criterion two above).
Once we have a suite of test configurations, they can be freely run for changes to our fork for local testing. However, the only formal requirement would be to run these on pull requests to the main repository. These are fairly infrequent (on the order of every few weeks to a month). With this frequency, the cases don’t have to be tiny, but by the same token we probably don’t want to be running a super high-resolution global case!
For a given case, I think there should be some protocol for accepting it as “unchanged” or at least “scientifically valid”. For the most part, we can verify that the results remain bitwise identical compared to a reference run. This is a pretty easy validation: if the answers don’t change at all, it’s clearly behaving as it did before a given code change! On the other hand, if the answers do change, there probably needs to be a deeper verification. I think this could probably take the form of certain physical metrics (e.g. a transport or water mass transformation).
To collect cases, I think we’d like:
- a git repository containing the control files (
- required forcing/input data available on Gadi;
- a way to run the case in a representative manner (e.g. sufficiently long to get relevant results, but not too long as to be excessively expensive);
- a script/notebook/description of diagnostics to use for validation;
- a set of reference results.
None of this is set in stone, but at this stage we just need a little bit of momentum to get the ball rolling. Test configurations can come and go as needs dictate, so we don’t need to limit ourselves to grand multi-year projects or anything. And importantly, feedback is most definitely welcome! This is just one proposal for running tests, and I’m sure there are many different ways that I haven’t even considered.