Poster: Proposed architecture for weather, climate and CMIP data provenance

Title

A proposed architecture for weather, climate and CMIP data provenance tracking

About

Model input provenance should be available to anyone who uses model output data, since model input determines model output. Unified Model ancillary files contain model input data such as static initial conditions or periodic forcings. Their provenance includes a data source, an ancillary source and an ancillary file on a specific grid. Unified Model ancillary files are currently not FAIR: Their provenance is in generation tool documents, project Wiki pages, tickets and suites.

Provena (provena.io) is a provenance system supporting large modelling and simulation workflows. It resulted from a CSIRO project led by Jonathan Yu to support the Reef Restoration and Adaptation Program.

To add provenance capture to UM ancillary generation:

On the input data side:

  • Add Provena API calls to UM ancillary preprocessing and generation scripts.
  • Initially use Provena on AWS.

On the provenance side:

  • Adapt Provena data models to CF conventions.
  • Migrate Provena servers and stores from AWS to NCI.

Poster

ACCESS-NRI_provenance_architecture.pdf (1.1 MB)

Note: this topic is part of the 2024 ACCESS Community Workshop Poster session

1 Like

Hi Paul,

Would the intent be to integrate Proverna into ANTS?

How would provenance information be attached to ancillary files, would it be something like a text file kept beside the ancil, or would you need to look up the file in a database?

If you ping @paulleopardi he will get a notification and you might get a response.

The intent would be to set up a Python library to record provenance, using the Provena API. The provenance info would be available via a web interface and API, hopefully integrated with the Intake catalogue. Details and feasibility are yet to be investigated.