Session 6: Breakout room 2: Model evaluation, testing framework

This topic contains discussions, questions and thoughts for Session 6: Breakout room 2: Model evolution, testing framework
Time: 11:15 am - 12:45 pm (Arnold/Gab)

Model Evaluation/Testing breakout
@rbeucher, Emma, Ulrike, @csu, @LisaAlexander, @gab, Helen, Shayne, Arnold, @kdruken

Gab: ESMVal tool would be a really good thing to support especially for cmip7

Romain: We’re working with Arnold and tried this on cmip5 last week. I talked with Met Office and there’s work to make this the flagship tool for evaluation. Think it’s in our interest to work with them and the broader community. Very aware of the issues - main one from feedback here in Australia is expecting cmorised data and issues around there.

Gab: is this more on the access output side more than the observational side?

Romain: don’t have a clear answer

Gab: Can you take iLamb datasets straight across but quite a lot in iLamb.

Romain: Haven’t tried with ESMVal yet. Good point, should try that.

Shayne: question, have output and put it into ESMVal tool. How do you define the ‘better’ model?

Gab: you can incorporate the different models as weights, if you get a collection of red flags that helps highlight possible issues.

Romain: Met Office wants to integrate esmval into rose/cylc workflow to run diagnostics when they run the model.

Helen: Seems to me there are the tools that are critically important and there might be feedback between the tools and research needed to investigate what the evaluation is telling you. AT a high level this breakout this in addition to the tools there needs to be strong linkages between the research and model development so that the feedbacks are happening.

Gab: would it be useful to have a hit list of known issues that people could up vote or down vote, you would probably find connections

Shayne: would there be feedback into the working groups?

Helen: I like that. Pathways, whatever way it gets implemented there needs to be that connection between the tool and working groups.

Gab: would there be tools that are hosted by NRI. Some kind of tool to sort what the issues.

Romain: how do we compare other models to access models

Gab: goes into the same system.

Shayne: for development of cmip7 and we have data from cmip6 and we can compare to that.

Gab: some type of Jira system or Trello where it’s just where you keep tickets of things identified from the evaluation tools and tie it to different fields and you can associate people to it, those identified or those who would help resolve and it would be something that can be upvoted. Track people who can express interest in it. Even if I’m just using the model they can refer to this to be aware of the issues.

Lisa: go back to shayne’s point about using cmip6 as benchmark. We have it and already know about some of the issues of cmip6. Good place to start.

Gab: we’ve covered off two things with no descent, getting some evaluation packages and idea of getting some system to identify and prioritise known issues. Can include comparison with other models or comparison with observations.

Shayne: question on how many observations you can use in esmval ?

Arnold: depends on license, some restrictions on what data is able to be used

Gab: are there ways to host tool and offer flexible ways to use obs that they can depending on license

Helen: are there any improvements to esmval that this group want to look at?

Gab: probably. CMIK includes ilamb. Just having something to start with.

Romain: seen some discussion on esmval and want to stay away from developing a core dataset. Doesn’t mean we don’t have to do that.

Helen: Suggestion, the no brainer is use esmval but the ambitious option is there might be interest in improving tools.

Gab: definitely

Helen: Often wonder if we could make better use of the surface/land flux tools?

Gab: ilabm and esmval area fundamentally are spatially gridded datasets. Model evaluation.org has focused on trying to get as much flux tower information as we can and the in situ data and …, and that’s the only one that really does this.

Shayne: haven’t used esmval tool. They generate figures, does it have dive downs as well?

Gab: ilamb will come up with a html page and then you can break it down to variables, spatial plots, etc. specify region, time period, etc. lots of different metrics and if you don’t like it you can change the different weights.

Arnold: (showing some examples on the screen) esmval example is mainly from command line, different from ilamb with the interactive ability

Dougie: ilamb and esmval are quite different in the way you lay out, esmval tool written so you can add your own recipes

Gab: (showing ilamb example)

Shayne: is there some way of looking up the libraries if they already exist, is there some library you can look thru

Gab: make these packages bespoke to these Australian needs

Shayne: what if you don’t want the default settings

Gab: YouTube channel with lots of info on this, super keen to engage. Haven’t had experience with esmval tool not sure if they’re the same. Difficultly for ilamb, only has 2d fields for ocean and land.

Romain: idea is to have collection of recipes but different degrees of quality. I think what is needed is to identify a core set of recipes that are well documented, etc. That’s where you need good feedback from community to identify these.

Gab/Arnold: how to use ilamb for calculating enso

Shayne: on the regridding - multiple questions on how important it is with energy budgets, there was quite a bit of effort in the esmval routines to make sure it was conserving.

Gab: not sure what is in ilamb. Suspect a set of options on how the regridding works.

Gab: there might be a visualisation tool for esmval?

Romain: not sure

Shayne: almost certain pcmdi website has a metrics package

Romain: there’s not one tool to solve all the issues. Ilabm is one, esmval is one, modelevaluation.org for other reasons. Not just one reason.

Gab: if there is not good way to visualise what comes out of esmval tool. If you were to put the work into modelevaluation.org, that maybe be a way to do it if no other off the shelf options. We’ve managed to embed ilamb is definitely feasible . Wanted to checkin with cmik, it had java output and would embed really well.

Romain: need some way of delivering, online portal is an idea.

Lisa: lots of evaluation tools out there. Gab, yo unmentioned cmik but sounds like it brings multiple evaluation packages together.

Gab: they’re doing it in DoE context and esmval might not be part of it. If we were interested in having ME .org and have all of these .

Gab: vision for ME .org would be to have many engines running and you can select what you need that would show you your run, what different packages show, etc. Even as a repository checkin, run this particular suite.

Kelsey: what does simple option look like?

Romain: idea is to make outputs available on webpage, need workflow on NCI , core set of recipes, maintain/document

Gab: think that’s right, and next step after that is to have ME .org for land

Gab: first low level is have ilamb and esmval running on NCI along with place tracking issues and ability to address issues; step 2: those systems in an automated way, e.g., if we have a testing branch, every time things go in we have testing, etc.;

Shayne: we have this workflow but could we have 5 each WG and each one could have different working workflows

Gab: step3: have working group specific evaluation pathways

Shayne: great way to get groups engage

Gab: and Romain, at this stage is where we could do me .org on the land side and maybe bring in the flux tower at this sage.

Gab: 4th stage would be having a central home/portal (me .org) for all the tools

Emma: just wanted to flag that ACS is doing some work for using ilamb in their atmospheric work. Would have some interest. Think also a topic that would be relevant for step2, not sure how much of esmval is specific to Australian and think about what isn’t evluationed in these 2 packages and that needs to be incorporated.

Romain: there’s a lot of work that needs to be done there

KElsey: is this a subset/note for step 3?

Gab: yup, should add this to step 3 - this needs to be part of it

Helen: important point is we’re not trying to do everything, the research community, have to make these connections. So important.

Gab: not sure if there’s an evaluation working group but Romain should make sure he leans on the community

Chun-HSU: wanted to raise a point from modeller perspective… met office spent years developing ga7, had problem and spent years fixing problem, need good support for CI testing

Dougie: came up very briefly, worth emphasising, both ilamb and esmval expect some DRS or predefined structure using the tools, if it’s not made easy to convert your data, people want use it. Who’s job is it to do this?

Gab: in case of ilamb global scale, this is done but in Australia context this is a good question.

Dougie: can i use my model output with ilamb?

Gab: no, needs to be cmorised. Is this on NRI agenda?

Romain: not a problem for CMIP contributions but a lot of people don’t cmorise their own data. Need to be able to leverage existing data even if not cmorised.

Dougie: could take step back and cmorise everything

Ulrike: discussion shows evaluation is quite tricky and has many aspects, going to open another can of worms. Come across with ACS and many projects and what evaluation means for their applications. Would be useful to think about how these evaluation results influence user applications? Might be step 7, 8, even 10 but important to think about it.

Romain: see this as part of training, needs to go along.

Ulrike: it’s a step in providing confidence in the modelling

Helen: such a good point to raise. If we had the resources, should have parallel activity to develop some way of reflecting these evaluation for users of these tools, or even universities and students. Sooner than 10 years time would be really great.

Gab: can we go back, perhaps ilamb is actually zero order step and make sure we have the right datasets in there?

Shayne: raise there’s been some literature on how many years you need to have robust results? Raise because you come back to ilamb and there’s numbers in that chart.

Gab: everything is below that.

Gab: Might be case where you don’t want an off the shelve, this come into that category about what particular communities might want.

Shayne: just knowing there’s so many variability in the system would be useful

Gab: could create own artificial datasets or xx

Helen: we’ve been focused ont eh domains, but is there a process lens here. ENSO is a process. Wonder if there’s a process way to look at.

Gab: this should come up in the issue tracking and it should come up there.

Helen: need some oversight maybe?

Gab: would be really cool if the top issues could be really visible that people can see

Shayne: can there be these high-level agendas and could we use this, especially if they’re becoming troublesome

Helen: you need the right group of people to be aware of these top issues and making sure from top down perspective that we’re funding the right priorities

Shayne: can we have an ACCESS-NRI grand challenges

Helen: what you’re doing is using the NRI tools to identify these challenges

Gab: what NRI could do is hold an annual workshop to have breakouts to address these

Kelsey: can we come back to the Dougie’s data comorising, where does this sit?

Gab/others: we need to flag

Dougie: it’s the major things that’s prevented me from using esmval in the past

Everyone: agree tools have to be easy to use it they are to have uptake

Dougie: also a tool from Ben Schroder and being used for ACS. there are options.

Shayne: Dougie, was your point that we shouldn’t cmorising data?

Dougie: not sure my point but heard there might be changes to how things are cmorising data if schemas change, would have to undergo this whole process again. We should have this whole process as agile as possible. In research context, a lot of people won’t do it.

Gab: in land surface, we went to doing this automatically in a standardised format. 1999 was first netcdf format that land surface came up with for MIPs and evolved but already have this. But if people haven’t done this, it’s a massive issue.

Dougie: would be a massive job to cmorise COSIMA output for example, very little comorised - only the omip contributions.

Dougie: you can get some of the way there with mom configurations but you can get all the way there.

Gab: and it’s not something you do on the fly?

Dougie: it would temporarily duplicate data but not too much of an overhead

Romain: but a high-level decision needed

Arnold: question - does this mean we need to release data early in the model development process

Gab: one of the ideas was that anybody could share all the ilamb datasets and you could sort by different xx and you could learn from others without needing the data

Romain: question from Chun-Hsu on line, there’s a cost to do evaluation as part of their workflow.

Gab: primary need is for developer to know if there is something that broke, so prioritising the speed for that to come back is a good idea. Tried to ensure this for the flux tower data. If you get feedback quickly, so much better.

Romain: point is you need time and all this testing pipelines and that’s a resource aspect not often taking into account.

Summary:

Benefits: Credibility, ability to catch things early, feedback to/between researchers and model development, avoiding frustration, trust factor. Also confidence in passing information to downstream applications.

All the lessons that have been learnt, put in place something to address. Strategic benefits, Australia wide quantified.

Pathways/options/priorities →

  • Step 0: Make available and support iLamb at NCI, and establish an open science issue-tracking system for the community (e.g., ENSO, etc.)​
  • Step 1: Add ESMValTool​
  • Step 1b: Community building and engagement → issues at top of the issue tracker are brought up a regular WG meetings and workshops ​
  • Step 2: Add specific development workflows for each WG and this includes adding Australian specific datasets to each of the development workflows​
  • Step 3: Build/enable CI/CD on these workflows​
  • Step 4: Bring all the tools into a single API or home for the evaluation tools ​

**Caveat → need for cmorised data​

2 Likes