Title
A proposed architecture for weather, climate and CMIP data provenance tracking
About
Model input provenance should be available to anyone who uses model output data, since model input determines model output. Unified Model ancillary files contain model input data such as static initial conditions or periodic forcings. Their provenance includes a data source, an ancillary source and an ancillary file on a specific grid. Unified Model ancillary files are currently not FAIR: Their provenance is in generation tool documents, project Wiki pages, tickets and suites.
Provena (provena.io) is a provenance system supporting large modelling and simulation workflows. It resulted from a CSIRO project led by Jonathan Yu to support the Reef Restoration and Adaptation Program.
To add provenance capture to UM ancillary generation:
On the input data side:
- Add Provena API calls to UM ancillary preprocessing and generation scripts.
- Initially use Provena on AWS.
On the provenance side:
- Adapt Provena data models to CF conventions.
- Migrate Provena servers and stores from AWS to NCI.
Poster
ACCESS-NRI_provenance_architecture.pdf (1.1 MB)
Note: this topic is part of the 2024 ACCESS Community Workshop Poster session