Following discussion during the CABLE4 meeting and Rachel’s work around matching ACCESS-ESM vegetation maps to the LUH3 dataset, there is a need for a tool for making modifications to ACCESS-ESM restart files. I want to scope out exactly what people are looking for in this tool.
Purpose
Modify ACCESS-ESM restart files (for the land surface) to accommodate modified vegetation maps. This is non-trivial as the existing restart file doesn’t necessarily contain sensible values in unused vegetation tiles.
Should we also accommodate other changes? Say improved soil property maps are produced, should the tool also be able to make these changes? Are other ancillary updates, likely to require derived changes to other parameters?
Requirements
Take a user specified vegetation map and reference ACCESS-ESM restart file, and create a new restart file, in which the state properties within all tiles (or just the required vegetation tiles?) have physically sensible values.
It should be possible for the user to specify the remapping method for each property, BUT there should be good defaults that are fallen back to if no method is specified. For example, properties relating to the short time scale physics, like soil moisture or temperature, area averaging would be applied by default, while for properties relating to the carbon cycle, some nearest neighbour process would be applied.
What does nearest neighbour actually mean in this scenario? Say there is a grid cell which was originally 100% Evergreen Broadleaf, but the new vegetation map specifies 50% Evergreen Broadleaf and 50% C3 grasses. How do we fill the phenology variables for the C3 grasses?
Comments/Feedback
I plan to build a bit of a prototype to demonstrate over the next few weeks. If people involved in this effort (@RachelLaw, @tiloz, @inh599, @tammasloughran, @clairecarouge) have more clear ideas of what this tool must achieve, or comments about the details that I might be missing, please let me know.
There may be useful ideas in ANTS which is the tool used by the met office to pre-process external datasets for use by the model.
There is a contrib repository with a variety of scripts using the tool.
It’s probably useful to distinguish between those fields in the restart file that are initial conditions and those that are brought in from ancillary (forcing) data. In the case of the ancillary information, it may be better that we write new ancillary files (this would presumably include the vegetation distribution) and then use the ‘reconfiguration’ step of the UM to bring these into the restart file.
For the new ESM1.6 vegetation distribution I’ve been working on, my ‘nearest neighbour’ for the C/N/P pools had a number of options. Except for veg type 10 (c4 crops) which was new, it only used tiles of the same pft as the one we needed to fill. It did a local fill if it could - I defined this as an average of any tile of that type within +/-2 grid-cells, ignoring area-weighting and whether you hit a boundary of the dataset (i.e. ignored 0=360 longitude). If no tiles existed within this region, I averaged tiles within all longitudes within +/-10 degrees (+/- 8 grid-cells). This covered every case I needed but I wrote in a global average if the local or regional cases didn’t work. If useful, my fortran code is /g/data/p66/rml599/luh2/luh3/restart-fields/modifyCpools.F90.
I guess the main challenge with a tool is whether there is a generic enough solution or whether there are always/often going to be special cases that need to be accommodated. I picked my averaging regions based on a check of how many tiles would be solved at various levels of local or regional averaging.
I think a tool like this would be really valuable for researchers in the paleo community, who often create vegetation distributions for different time periods.
It might be outside of the scope of the initial version for CMIP7, but I think a popular use case would be working with modified land sea masks. For example, if you have a vegetation map defined on a modified land sea mask, and a reference restart file on the original mask, a method to fill the cable state variables with sensible values on the new land points based on the new vegetation map and original restart data could be really useful.
Tagging @dkhutch who’s done this previously and might be able to clarify!
I think it’s possible to write such a tool that is generic enough to allow tuning of the search → average process, that works for both scenarios (remapping vegetation on either the same or different land grids). The cascading search → average process would work in both instances, which is:
Attempt to retrieve from the same grid tile
Attempt to retrieve from a specified radius around the grid tile
Attempt to retrieve within a specified range of latitudes around the grid tile
Retrieve globally
It could easily accommodate defined mappings to new vegetation types as well, as was done with the C4 grasses in @RachelLaw’s script. Would simplify the process of adding new vegetation types in the future.
clairecarouge
(Claire Carouge, ACCESS-NRI Land Modelling Team Lead)
6
Just to note that @RachelLaw took an average for a range of longitudes not latitudes. This is because vegetation tends to vary more with latitude than longitude. Think, tundra up in the north, then evergreen needleleaf south of it, then deciduous broadleaf trees for example.
Yea this is what I mean- average over everything within a given range of latitudes, i.e. within ±5 degrees of the original point, and be agnostic to the longitude.
Just flagging here that we do modification of restart files in the regional nesting suite. Not recalculation of fields but replacement of data.
It might be simpler to keep the ESM and RNS tools completely separate but the “modify restart files” capability is relevant to both suites at the same time (most likely for different purposes).
Is this done via ANTS for the regional nesting suite? How powerful/easy to use is the existing framework you have? Something non-trivial which is relevant to this use case is creating a restart that contains a new vegetation type; could it handle that?
It was to modify fields existing in the start dump not add new ones - sorry.
It is quite simple.
In our use-case ANTS would be used to prepare a field with the corrected field written to an ancillary file. The corrected field would then be added to the start-dump/restart file in a two-step process. i.e. create proper ancillary then modify the start dump using ancillary (read in via python/mule). We don’t currently need to do that though because the data exists in a suitable form to use it in our replacement scripts directly.
I don’t think we’d be modifying the restart file in place- more using it as a reference to create a new restart file. Thinking about it, I might be wrong in saying that it’s a “new” vegetation type (and therefore new field); I think it was that a particular vegetation type was unused in the ESM1.5 runs, so therefore had no immediately obvious reference data to initialise from.
From what I have seen so far, I would be in favour of using Python scripts in ANTS Contrib to generatw new ancillary files and using these to create the restart file, as @RachelLaw and @Scott have described. I think that this may be cleaner and better documented in the long run, as well as preventing a proliferation of possibly redundant tools. The exact situation here may be different and may require an extra tool, but I think that it is at least worth investigating.
There are too many details to go into here but there are parallels for what you are doing in the ESM in the RNS. In the RNS the heavy lifting is done via the UM. Please reach out if you want to know what they do.
It’s entirely unclear how that tool works, and we definitely don’t want to be adding more legacy Fortran code to our workflow, there’s enough of that as is I think. I would be surprised if that tool achieves what we want it to achieve.
I’m at the stage where I now have corrected fields contained in a NetCDF file, with variable names matching the names of the relevant fields in the original dump file (with the caveat that any “/” characters replaced, as they are not allowed in NetCDF names). I would like to place these fields back into the original dump file, and write the results to a new fields file. @cbengel Do the tools you use cover this use case?