This topic is to try and keep together feedback on analysis plots for benchcab. (@gab )
First feedback from @aukkola (1 May):
- add summary table like ILAMB dashboard.
- more variables: see if we can have the same variables as with ILAMB: radiation, hydrology and carbon.
- include some comparisons between science configurations and not just between source codes as we have now. This could be done via summary tables.
- better legends, titles of the plots to understand better what they represent.
Feedback from Mengyuan (21 June):
For the 42 sites experiment, it would be good to also have plots at individual sites: diurnal cycle, annual cycle and pdf plots to check nothing has gone haywire anywhere.
@Mengyuan_Mu For the 42 site experiments, we have plans to add more plots but they would still be summary plots.
We also have plans to add more variables in the analysis and more science configuration. This makes it tricky to give plots for individual sites for this experiment as it will very quickly end up having a thousand plots there. At what point, we won’t be able to see anymore if anything is wrong.
At the same time, I see what you are worried about. But I suspect we will need some discussions to find the best way to get something useful.
List of variables we want to have for analyses in modelevaluation.org for benchcab:
- GPP (not observed, NEE + respiration model)
- LAI (available at some sites only)
- Ecosystem Respiration (available at some sites only)
- Soil Carbon (available at some sites only)
- Evaporative fraction
- Latent Heat
- Sensible heat
- Surface Soil Moisture (available at some sites only)
- Upward SW
- Net SW
- Upward LW
- Net LW
- Net Radiation
- Ground heat flux
Thanks, Claire. It is totally understandable.
Another idea came up. For the moment, we have 5 experiments for specific individual sites. We are wondering if it would be possible to adapt the analysis script so we could have 1 experiment with the datasets for all the sites. But it would expect a model output from one observation site only, get the information about the site from that model output and plot the detailed analysis for that site.
Thanks for setting this up Claire - great idea. Most of these things will be relatively straightforward. Once I’m back from the US (mid August) I’ll dedicate time to knocking some of these off. I’m also beginning to indoctrinate Jon into this R codebase (he’ll submit his PhD within 8 weeks and stick around for another 18 months) who’ll be able to help as well.
Great to know we’ll have more help on the R scripts. I was thinking about how to find a way to achieve that.
Seeing a talk tonight, I came to the conclusion it would be great if we could have information presented along some key modes of variability people care about. For example, what is the performance for the seasonal cycle for magnitude, timing, variability, etc. Same around inter-annual variability.
I might be getting a little ahead of myself here/asking stupid questions:
- will benchcab be able to run CABLE-POP?
- can I benchmark at a specific site? I.e. using a custom met file?
CABLE-POP: eventually but it will take time. The main issue is what to do about the spin-up. It is too expensive and too long to run the spin-up every time during development for example.
Specific site: not currently. There are several reasons for this. First, there is no point in evaluating CABLE at irrigated sites for now. We know it will be bad. Also, sites with too much missing data or short timeseries are poor choices for running statistic-based diagnostics.
The last reason is due to how me.org organises the analysis. For the moment, the analysis is linked to a given set of observations and expects to have data for all of these sites. I have emitted the idea it would be great to develop an analysis script that has access to “all site data” (except the “problematic” sites identified above) and only picks the ones we provide model outputs for. This way we could get
benchcab to run any subset of site simulations. The change in
benchcab to allow this is really minimal.
One additional reason:
benchcab main purpose is to enable us to have reproducible, standard evaluation so we can easily compare results between versions. This means additional flexibility to the tool is not a priority at this point. Especially because additional flexibility can make it harder to determine if we have comparable evaluations.