Community Talks 1: Aidan Heerdegen (ACCESS-NRI) RRR: Reliability, Replicability, Reproducibility for Climate Models

Community Talk: Aidan Heerdegen (ACCESS-NRI)

RRR: Reliability, Replicability, Reproducibility for Climate Models

Abstract

It is difficult to reliably build climate models, reproduce results and so replicate scientific findings. Modern software engineering coupled with the right tools can make this easier.
Some sources of complexity that make this a difficult problem:
Climate models are an imperfect translation of extremely complex scientific understanding into computer code. Imperfect because many assumptions are made to make the problems tractable.
Climate models are typically a number of separate models of different realms of the earth system, which run independently while exchanging information at their boundaries.
Building multiple completely separate models and their many dependencies, all with varying standards of software engineering and architecture.
Computational complexity requires high performance computing (HPC) centres, which contain exotic hardware utilising specially tuned software.
ACCESS-NRI uses spack, a build-from-source package manager that targets HPC, and which gives full build provenance and guaranteed build reproducibility. This makes building climate models easier and reliable. Continuous integration testing of build correctness and reproducibility, model replicability, and scientific reproducibility eliminates a source of complexity and uncertainty. The model is guaranteed to produce the same results from the same code, or modified code, when those changes should not alter answers.
Scientists can be confident that any variation in their climate model experiments is due to factors under their control, rather than changes in software dependencies, or the tools used to build the model.

Please use this thread for further discussion on this talk.

Some of the relevant repositories mentioned in the talk:

Model configurations

Model deployment repositories

Build Infrastructure

really enjoyed this talk @Aidan! Looking forward to meeting hopefully tomorrow

1 Like

Thanks @mdsumner. Keen to chat!

@rmholmes asked me afterwards if changing code related to model output diagnostics would be an example of a code change that would only change the minor version, i.e. not change the reproducibility of the model.

The answer to that question is YES! Code changes that only affect diagnosed outputs, e.g. fixing the units of a field, or adding an entirely new diagnostic is not something you would ordinarily expect to change the reproducibility of a configuration.

Some examples of other changes that might not also change the reproducibility of a model configuration:

  • Some PBS run options: walltime
  • Metadata updates
  • Collation options
  • restart_freq
  • Diagnostic output options (changing diagnostic profiles)
  • Changing run time debugging/logging options
1 Like