CCI Land Cover and associated ancillary suite

Figured it was past time to make a post on the CCI ancillary suite, which will be the supported way of creating AM3 ancillaries, to describe the decisions made and issues addressed.

The CCI ancillary suite was originally developed by the UKMO to create arbitrary resolution land ancillaries using the ANTS tool. It is based on the Regional Ancillary Suite (suite u-bu503), with some additions from Siyuan (@mlipson please correct any details I got wrong there).

There are a number of problems which require modifications to the ANTS code. We are going to try to do this in a way that allows our work to be merged back into the source, which means keeping the current behaviour as default but allowing different behaviour via configuration options.

So the issues with the suite (and our plan for handling it where applicable) are:

  • The land cover ancillary used by the suite was the dataset at g/data/access/TIDS/UM/ancil/atmos/master/vegetation/cover/cci/v3/vegetation_fraction.nc. However, this dataset had already been preprocessed in a way specific to the UKMO’s operations. The prepocessing steps applied were:
    1. The source CCI data was written over by the IGBP dataset for the region latitude < -60.0 i.e. Antarctica. It’s not clear which dataset is better to use for the bounds of Antartica- see a comparison below. Both datasets have issues at the south pole- the first row of cells are classified as water. That misclassified row will be set to ice.

  1. CCI water bodies classification was split into sea_ocean_water and water_bodies, representing regions handled by the ocean model/ancillaries and land model respectively. The UKMO treated the Black Sea, Caspian, North and South Aral seas, as well as lakes Victoria, Superior, Huron, Michigan, Ontario, Erie (i.e. the great lakes) as ocean, with a long list of smaller lakes as water bodies to be handled by the land model. The current choice for AM3 is to only include the Black Sea as ocean, and treating everything else inland as lakes to be handled by the land model.
  • There are numerous locations in the ANTS and ANTS contrib code that contain hard-coded indices based on JULES tile indices e.g. Ice being tile 9 (CABLE is 17), bare soil being 8 (CABLE is 14). These indices are being made settable via the CLI, defaulting to the JULES values when not specified.

  • The crosswalking table existed to map the CCI classifications to the 9 JULES tiles. Mat Lipson has built a new crosswalking table to map the CCI classifications to the 17 CABLE tiles. This was a quick job done to get something that would run- this should be one of the first things we start tuning on the land side.

  • The original CCI dataset was taken from v1.6, which has been superseded by v2.0. The intention is to re-publish the v2.0 data for each year on Gadi. The new version is aggregated over single year epochs, as opposed to v1.6 which was aggregated over 5 year epochs. This may make individual years more sensitive to wet/dry periods, but the ancillary suite will allow specification of a target year to use for the land cover, so it is possible to choose one that suites our purposes.

I’ll update this as issues come up and decisions have to be made.

2 Likes

All looks good, thanks for documenting this @lachlanswhyborn

The only thing I’ll add is that we’re intending this workflow for use in both global (i.e. AM3) and regional (rAM3-CABLE… in development) applications.

It seems to me that:

  1. The Antarctica map is a low priority problem. It is unlikely we run any high-resolution model over Antarctica where the land cover will matter (ocean and ice-sheet won’t care about this work).

  2. For the new CCI dataset, how difficult would it be to great an option for a 5-year aggregation on the v2.0? Not as a high priority task, just wondering if that could be an option to limit sensitivity to wet/dry.

I think the Antarctica map could be relevant for how the they integrate with the ocean ancillaries. I think it’s likely that the ocean ancillaries will handle it just fine, but not sure. I personally would like to remove the IGBP merge step altogether, so want other’s thoughts on whether that step has value.

The row of points around the south pole should be corrected, as I think I remember that CABLE no longer requires a tile to be no ice or all ice, so without correction it could lead to some spurious ocean water.

On aggregating the v2.0 data over 5 year epochs, this is impossible to do properly with the data provided, as they don’t record the number of counts for each classification at each cell. I think the best you could do would be to look at a period of 5 individual years and turn each cell into the most common classification over the 5 years.

About Antarctica, for using the workflow with the coupled model (i.e. with the ocean), then the ocean land-sea map needs to be used! No other. I don’t know the workflow enough to know when in the workflow the ocean-provided land-sea mask would come in, so maybe having a high-res CCI-based coastline in a first instance makes sense.

So the question here is only relevant for AMIP runs, in which case, Antarctica borders are very low priority. So low that I don’t really care about having IGBP or CCI for defining the coastlines. But I may forget some details.

Agree that there should not be some ocean at the South pole.

Yea, this is true, it may even the case that for the final release that we want the workflow to use a provided land-sea mask from the outset, rather than allowing CCI to determine the mask. It’s just whether there are ocean ancillaries e.g. SSTs for all points defined as ocean.

It makes sense to me to have only one source of data, rather than a bespoke mix of IGBP and CCI.

An important update on the CCI suite and the handling of water. Martin located a companion dataset to complement the CCI land cover from https://doi.org/10.3390/rs9010036, which separates ocean water and permanent inland water. The steps I’ve done to integrate this with the CCI land cover is:

  1. Regrid the 150m resolution water classification dataset to the 300m resolution of the CCI land cover.
  2. Separate the water classification in the CCI land cover into sea_ocean_water and inland_water by:
    • Setting all cells that are both water in the CCI land cover and open_water in the water classification to sea_ocean_water.
    • Setting all cells that are both water in the CCI land cover and inland_water in the water classification to inland_water.
  3. Fill the remaining water points (i.e. cells identified as water in the CCI land cover, but non-permanent) using a nearest neighbour search.

The resulting lake fractions after passing through the first stage of the CCI workflow at N96 are:

Note that there are some non-zero lake fractions in Australia and South America, they are just close to 0.

I now have the new ancillary suite creating the full set of ancillaries required for the base line AM3 configuration. I’m trying to set up an n96e suite with these ancillaries on a branch of the configs. I’ll update when I have the configuration running.

3 Likes

I’ve managed to get the new ancillaries through the reconfiguration stage with a fair bit of massaging of the vegetation ancillary (LAI and canopy heights). The massaging was:

  • The reconfiguration stage expects the time period of the vegetation ancillary to be year 1 (it’s a periodic monthly ancillary). There are other periodic monthly ancillaries that don’t have this requirement, so it may be something done to handle CABLE stuff specifically. I haven’t been able to find the corresponding code that might have done this though. It also requires the calendar to be correctly set to the same calendar as the run.
  • The reconfiguration reported the PP header of the file to be incorrect. I couldn’t interpret the header checking code, so I copied the header across from the default vegetation file.

The UM run is reporting:

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!!       ERROR        ???!!!???!!!???!!!???!!!???!!!
?  Error code: 67
?  Error from routine: INANCILA
?  Error message:  INANCILA: update requested for STASHcode 132
?        but prognostic address not set.
?        As prognostic is not present in UM run, please turn off ancil updating
?        via the items namelist.
?  Error from processor: 190
?  Error number: 58
????????????????????????????????????????????????????????????????????????????????

Stash code 132 is the Dimethyl Sulphide concentration in sea water, which the suite creates an ancillary for. Anyone have ideas as to the cause of this? I’m going to compare the old working ancillary to the new one to see if there’s anything different there. There shouldn’t be much- the same ANTS app was used for both.

It sounds like the model is configured so that it doesn’t need field 132 (e.g. the relevant science section is disabled), however you’ve also set this field to be updated from an ancillary. It wants you to remove this unneccessary field from the ancillary settings.

Hmm, I must’ve unintentionally changed a science option in the Rose configuration, since I thought I was still using the same science setup as the original. Thanks for the insight.

Positive update- I’ve got AM3 running with the CCI ancillaries. It’s completed 3 months without crashing at n96 resolution. There are a number of things I had to in quite an ad-hoc fashion between running the ancillary suite and running the model to get it to work:

  1. Shift the domain of the ancillaries from the [-180, 180) degree longitude domain to [0, 360) and flip the latitudes so that they’re descending rather than ascending. Trying to do this from the outset in the grid definition caused some grid mismatch ancillary errors.
  2. Adjust the time coordinate on the qrparm.veg.func ancillary (LAI and canopy heights). These ancillaries are seasonal (monthly periodic), and for some reason it expects the year to be year 0. Other seasonal ancillaries do not have this requirement. Also requires the calendar to be set to 360day.
  3. The header on qrparm.veg.func was also wrong for reasons I didn’t understand. To temporarily address this and the previously mentioned time issue, I simply copied the time coordinate and header from the default AM3 vegfunc ancillary to the CCI version.
  4. For some reason, Martin’s post-processing script which fills in the CABLE state variables for soil temperature, moisture and snow quantities was not working correctly, so I’ve written a new script to do it. It’s not part of the suite yet, but I think we should replace Martin’s existing script as it has dependencies we don’t include in our conda environments.

The run is currently at /scratch/tm70/lw5085/cylc-run/access-am3-cci-ancils.

1 Like

A minor update- I fixed the grid issue by setting the target grid definition to those under /g/data/access/TIDS/..., but still having issues with the LAI/Canopy height ancillary. I’ve got n216 and n512 ancillaries through the reconfiguration stage up to this point, and the error I can’t get past is:

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!!       ERROR        ???!!!???!!!???!!!???!!!???!!!
?  Error code: 417
?  Error from routine: RCF_ANCIL_ATMOS
?  Error message: replanca_rcf_replanca: PP HEADERS ON ANCILLARY FILE DO NOT MATCH
?  Error from processor: 0
?  Error number: 1
????????????????????????????????????????????????????????????????????????????????

This is triggered by this section of the UM source code. I’ve compared the fixed length header to the equivalent n96 ancillary which we use by default, and can’t work out which header entry would be triggering this. It’s possible to simply copy over the old header and adjust the resolution constants, but I don’t think this is a good long term solution.

If any UM experts have any advice on this, that would be appreciated.

@lachlanswhyborn Not an expert here - this looks like the number of entries in the ancillary as detected by inancilla_mod isn’t matching the number of entries in the ancillary as checked by replanca … which in turn suggests that something’s not quite correct in the ancillary header information. Changing from N96 to anything else would need the size attributes updating - and we have to note that there’s a link/dependence on npft in that resizing**

**the UM13-JULES7 ancil reconfiguration steps maybe assuming 9 tiles/5 PFTs (for NWP applications) or 27 tiles/13 PFTs (UKESM2 configuration) or something similar.

Yea, the issue is definitely in the header. I got around this temporarily by copying the header from the default AM3 vegfunc ancillary file- this allowed the AM3 to run a year successfully with the CCI ancillaries. There were quite a few differences between those 2 headers, but the resolution specific ones were the same. It’s pretty hard to find documentation about what precisely is in these headers unfortunately.

The key thing will likely be the number of entries for that field in the ancillary - which would be something like a [nlat,nlon,npft] triple, or the total number of entries ngridcells * npft (or possibly 'n_cells_with_land * npft) - also possibly this would need to be ntile not npft

… and/or number of bytes for each entry.

Since this information is going in via the general ancillary read section I would be surprised if there are any conditions/expectations around latitude (as happens with CABLE offline).

In the first instance I’d have a look at what changes to the header information was necessary for the soil temperature/humidity section in the N96 and N216/N512 reconfigurations … and/or reach out W21C folks for insight.

I think it should be (nlon, nlat, npft)- that’s what the default AM3 vegfunc uses, and we do the same in ESM1.5/6. The output log says that this i1 is 0, which should be the same as the LAI stashcode 217. The respective field headers in the ancillary had the correct stash code attached, but maybe there’s something in the file header that also encodes this information? I’ll do more digging.

The soil state variables aren’t read in via ancillary for AM3, they’re handled by the reconfiguration in combination with a script Martin wrote to spread the the average soil state across all tiles.

I think I’ve identified the problem. ANTS set the num_field_types header entry to 2, which to me seems the more logical choice, but the UM expects it to be 26 (13 for each LAI and canopy heights). It now successfully passes the reconfiguration stage.

Hi team, I am slowly catching up on NRI news but thought I would drop a line here - my team and I are running the RNS over Antartica, and this is a problem for us. Right now we are just doing something very hacky where we turn off a few of the land surface processes, so they don’t cause problems at run time… not an elegant solution but it seems to work.. If we can ever get to a stage of having better land surface represnetation for Antarctica - I would be interested for a few ideas :slight_smile: