Organising global driving data for CABLE

inh599 · 6 October 2023 05:15

Hi all,

The different use cases of offline CABLE have different requirements/expectations of their input data - much of this, unfortunately, is hard-wired into the codebase and reflects a period of time where storage was more limited than it is now. Some use cases (such as single site studies) provide meteorology at 30 minute resolution, others such as global simulations using GSWP3 use gridded meteorology at 3 or 6 hourly time steps, yet more such as BIOS or TRENDY use gridded daily data. Different use cases expect different input variables (though there is quite a lot of overlap)

As an example of the question around pre-processing vs runtime - consider BIOS. This configuration can operate at 5km resolution over Australia, and runs for 100’s of years. It uses daily data for 7 inputs. CABLE-BIOS does the time interpolation within the model run (it’s actually weather generation, so not just linear interpolation or anything that can be done like a simple nco type pre-processing. If instead we pre-processed the inputs then stored we would a) need to rewrite large chunks of the model code [and check that this hadn’t changed answers] and b) find storage for around 24 times (maybe 48 times) the data (and you may want to trial different weather generators, so possibly more).

Regardless of approach you have to do this bespoke work for each data set (they are all different) - meaning there’s quite a lot of burden, especially for those data that are regularly updated such as BIOS and TRENDY forcing.

There’s also the element of experimental flexibility - if for instance I want to do a simple experiment. 'What happens if the world were uniformly 1K warmer? This is a lot easier to do as a one liner to the code (with a swtich) than to re-process all the meteorology. However, equally you don’t want to allow lots and lots of this kind of stuff in the code (considered the TRENDY sections of the code where depending on the experiment you use different combinations of forcing, different switches etc. It’s all embedded into the code and distributed in multiple places)

Another aspect to this is data and code provenance - it’s a lot easier to say (and justify and version control) we used external data source x (doi) and model y (git repo), than try to point to/publish a temporary data set.

Regarding ACCESS - I don’t particularly see ACCESS generated forcing as something that different. We still have to worry about which variables, units, naming convention, resolution and metadata - i.e. making sure that the ACCESS forcing can be used by offline CABLE. We still have to worry about consistency with the other ancillaries. There is a bit of a thought process needed around how we generate and store the ACCESS output - do we output daily data or the data on the model time step? Also, do we try to bias correct the forcing data (for multiple sources of error)? What do we do about data needed for the spin up process?

Basically I can see lots of issues but haven’t landed on a firm opinion on the way to go - or at least on a way forward that doesn’t involve stopping everything for a substantive period and rewriting the code.

Topic		Replies	Views
Run CABLE-POP at ACCESS Resolution CABLE cable4-planning	26	422	29 August 2024
CABLE site runs with ACCESS forcing CABLE cable4-planning	42	293	16 January 2025
A centralised location for ACCESS forcing and ancil files Infrastructure	9	383	12 September 2024
Land modelling working group, meeting minutes 2023 (Archive) Working Group meeting , land , notes	16	1400	19 December 2023
New Input Routines for Met Forcing Land Surface	0	83	16 May 2024

Organising global driving data for CABLE

Related topics