There are several devils in the detail, to be sure. It isn’t worth doing unless it can be an automatic part of the model running process, IMO. In the first instance I’d imagine it just being run for ACCESS-NRI released models, but if we bake the information gathering into the tools we use, then anyone running the models at supporting centres (like NCI) could benefit from it I should think.
You’re right. And I shouldn’t be such a miser. I just came back to delete my comment because after thinking about it for a bit longer, I noticed that “It’s really hard to do” does not contribute all that much to the idea, but by that time you’ve already answered, so I’ll leave the evidence of my smallmindedness here
I was thinking about something along these lines, but from the other direction: parameter discoverability in MOM6 is painful. The only real way to do it is to run the model and observe the MOM_parameter_doc.all file, which will give the docstring from the first/last (can’t remember which) place a given parameter was read, and its value. If a parameter isn’t read, e.g. because it’s hierarchical, there’s no real way to know about it other than grepping through the source.
Recently in my Python wrapping of ALE I swapped out the pure Fortran parameter file parsing, and made it based just on a Python dictionary. This way I can just store the parameters in YAML/TOML, and change them easily at runtime (maybe the clearest example is in the regridding test).
I believe there’s a desire to keep MOM6 as a purely Fortran model – it used to have a C source file, but that was re-implemented in Fortran. It also only depends externally on MPI, NetCDF and FMS. Regardless, I was interested in seeing what other solutions are out there for specifying the model parameters, particularly with discoverability in mind. One (significantly more complex) system I’ve used is spud, which defines the parameters in the RelaxNG Compact syntax (the .rnc files in fluidity/schemas at main · FluidityProject/fluidity · GitHub). A graphical tool called Diamond parses the schema and can provide a graphical interface to editing the options. This is probably the other end of the complexity spectrum compared to what we’d need for FV ocean models (we don’t have to define Python functions to initialise fields in the model at runtime, for example)!
At least being able to see all the parameters available in the model up-front, without having to browse the source code directly would be a huge help toward discoverability, and perhaps prevent misconfigurations.
Good point. We want both discoverability and comparibility.
I buried it a bit, but did mention
Generate model inputs from DB
which is part of what you’d like to be able to do. So if we added discoverability to the list of use cases then that would cover what you’re after too, am I right?
Then it’s a technical problem of how to do this. Passively sucking out parameter values from runs and adding them to a DB or actively hunting them in the code.
The former would definitely not be complete for MOM6 until all parameter use and combinations are covered (ever?).
The latter relies on static code analysis (I believe grep counts if you’re skilled with -E ). Could you easily determine all the parameters and their default values with a proper FORTRAN linter (like flint or flint)?
I waved my hands a bit when I said
Make model output default values, and include these as a special experiment
which it sounds like wouldn’t work for MOM6 as it currently works, but is also a laudable goal as it is super useful to determine when parameters deviate from the default.
Having a system where user-settable parameter defaults are defined in a file which is either read at run-time, or used to generate code at compile time (like spud?) would make discoverability a lot easier.
That’s right. Being able to also reverse from a database to a configuration is probably good for most cases, but knowing the full set of available parameters is a slightly different problem. I guess part of this is also knowing what the parameters do. I suppose for MOM5, it’s probably possible to point to some section of the book, but at least a documentation string for the parameter would be more useful than its configuration key in the model.
Possibly a combination of both? Clearly for the majority of your proposed uses, you want to see what was used for the actual run. Perhaps my point was that given we’d be resorting to hunting for parameters in the code, why not change the code to be friendlier for discoverability, while not necessarily affecting existing workflows. This could also help to avoid possible issues: currently, the docstring for a parameter is provided when you request that parameter. If you request the same parameter from different modules, you should make sure you’re documenting it in the same way, or that updates to this documentation are synchronised.
Do you mean to use this as a mechanism for querying the default values? I guess at least MOM6 has the advantage of including these defaults in the MOM_parameter_doc.all (as opposed to the MOM_parameter_doc.short, which only includes those parameters which deviate from the default). I feel like something along these lines should be included as a baseline in a model: you want to know what parameters were actually used in all cases!
Definitely. At the very least reverse-engineering the parameters from the run outputs will be required to document a run unless the engineering to set the parameters also spits out some nice machine-readable parameter file (entirely possible and desirable as you say).
This is also model dependent. New models that are still being actively developed would definitely be a target for modifications such as this (MOM6, CICE6), but the case for doing this for mature models in maintenance mode (MOM5, CICE5) is not so strong. So a reverse-engineered solution might be the right one in that case.
Absolutely. So what I was suggesting is redundant for MOM6, but the older models might benefit from something like that.
Agreed that this would be a useful capability to have, if it can be done without inordinate effort. Model inputs (e.g. hashes from manifests) could also be useful to have in a DB, as would information on run resource use.
I’ve made attempts in this direction (detailed below), but a DB would make this sort of info easier to make use of.
However these give an incomplete picture as they’re based on nml input files and don’t include the default values. Also some parameters are subject to a master switch that can make their values irrelevant.