New WOMBAT evaluation

pearseb · 14 March 2024 03:36

Details of our approach to evaluate the WOMBAT model.

The following are datasets used to evaluate performance (WOMBAT-lite and WOMBAT-mid). Note that not all datasets will be applicable to WOMBAT-lite. These include surface PO4, surface SiOH4, the fraction of microphytoplankton, and depth-integrated nitrogen fixation rates.

Aidan · 12 April 2024 07:21

Looks awesome @pearseb!

pearseb · 17 April 2024 01:40

Schematic representation of WOMBAT. Tracers and biomass pools are represented by circles of different colours. Components of the ecosystem model, such as nutrients or phytoplankton, are organised within the dashed outlines. WOMBAT-mid includes all tracers and biomass pools (black and red), while WOMBAT-lite only includes those pools outlined in black. (Dinitrogen gas (N2) produced by the denitrification of nitrate (NO3) is not represented.)

pearseb · 17 April 2024 02:00

To optimise WOMBAT-lite, we ran 256 sensitivity experiments. These experiments explored a range of different values of 19 key input parameters to the biogeochemical model.

WOMBAT-lite was run for 10 years under the JRA55 repeat year forcing (ryf) initialised from observations of nutrients (WOA23), dissolved Fe (PISCES bgc model), oxygen (WOA23), carbon (GLODAP2) and globally uniform phytoplankton, zooplankton and detritus concentrations.

The skill of these experiments relative to 15 observation-based products is shown below:

The inter-experiment standard deviations in these fields are shown below:

The inter-experiment NORMALISED standard deviations are shown below:

Some redundancy in the observation-based products is apparent. NPP is very similar to grazing pressure and thus offers the same information. Chlorophyll and POC datasets too. We therefore focussed on 8 out of these original 15 datasets to assess model performance (see below).

pearseb · 17 April 2024 02:36

Based on the above results, we developed a traffic light system of model evaluation.

The approach is quantitative, involving the univariate metrics of correlation coefficient, mean bias and normalised standard deviations for key variables. The results are then categorized into a simple, easy-to-understand “traffic light” framework:

Green: Indicates good performance.
Yellow: Indicates acceptable but suboptimal performance.
Red: Indicates poor performance.

We first define thresholds for each metric that determine the categorization into green, yellow, or red. This requires knowledge of what is reasonable model skill given the observational product we are comparing too.

Our thresholds for the 8 key observation-based products:

surface dissolved iron
Oxygen at 250 metres depth
surface chlorophyll
depth-integrated chlorophyll
depth of the chlorophyll maximum
depth of the particulate organic carbon maximum
depth-integrated NPP (CbPM model)
Primary limiting nutrient (Browning & Moore 2023 Nature Communications dataset)

As a first pass, we eliminated all but the model realisations that performed optimally for the nutrient limitiation (LN) data.

This left 32 experiments, all with excellent agreement to Browning & Moore (2023) but with a range of good to poor performance in the other key observations.

In this figure, the first circle marker is the correlation coefficient, the second is the global mean bias, and the third is the normalised standard deviation relative to the given observation-based dataset. The star marker is the overall performance of that model run relative to that observation-based data product. Here, we are ranking the remaining model realisations from best to worst.

The following figure shows the top and worst performing experiments of WOMBAT-lite relative to the key observation-based products we are using to assess performance.

NOTE: World Ocean Atlas O2 in the low oxygen zones tends to be biased high, and the MODIS-based CbPM model is what we use for NPP, which is very high and all models apparently underestimate this.

AndyHoggANU · 17 April 2024 09:27

Wow, this is awesome. My takeaway is the NPP is bad. Iron is better than WOMBAT-old … other things OK??

pearseb · 17 April 2024 23:35

I think to say that the NPP is bad is to believe that MODIS CbPM productivity model is “true”. Iron is definitely better.

aekiss · 6 May 2024 02:47

Thanks @pearseb, this looks like a great way to assess the model.

I was wondering

how the model initial condition looks under these metrics
whether you’d expect the rank order to change much in a longer model run
whether rate of drift relative to initial condition should also be a performance metric?

aekiss · 6 May 2024 04:08

I noticed some of the ACCESS-EMS1.5 values in Ziehn et al 2020 (PI, quadratic phy mortality, prey capture efficiency) lie outside the ranges in your parameter survey - is this because the equations or units are different?

PSpence · 26 February 2025 04:52

@pearseb wow! This is super awesome. I am going thru this page with people (Hakase etc) at JAMSTEC. What a resource you’ve created here! Thank you!

PSpence · 26 February 2025 04:56

Preprint of Pearse’s paper is here: EGUsphere - Optimisation of the World Ocean Model of Biogeochemistry and Trophic-dynamics (WOMBAT) using surrogate machine learning methods

Topic		Replies	Views
Help us verify the rewritten WOMBAT Biogeochemistry model-evaluation , access-om2 , wombat	7	130	7 August 2024
Model Development: WOMBAT CMIP7 Models cosima	0	109	14 March 2024
Biogeochemistry discussion at the COSIMA workshop 2023 Workshops cosima , ocean-bgc , cosima-workshop-2023	5	302	8 September 2023
BGC Validation: ACCESS_OM2 1/10 degree IAF Biogeochemistry	22	544	11 June 2024
Vision for WOMBAT ocean BGC Workshops cosima , workshop , cosima-workshop-2022 , ocean-bgc	4	301	5 September 2023

New WOMBAT evaluation

Related topics