LUH processing

This topic is to document work involved in processing LUH data for ACCESS / CABLE-POP.

Issues include

  • use of gross or net transitions
  • mapping to CABLE pfts
  • how to use crop data and other supplementary information
  • understanding how processing was done previously, what code is still available
  • ensuring processes are better documented in future

From @clairecarouge

It appears the LUH2 dataset should already be available as a published dataset at NCI.

It is part of input4mips and this data collection is under here: /g/data/qv56/replicas/input4MIPs

It seems the dataset is split under various directories because it is split per MIPs, so scenario and historical data will be split for example.

@RachelLaw additional notes:

Historical data
/g/data/qv56/replicas/input4MIPs/CMIP6/CMIP/UofMD
UofMD-landState-2-1-h
UofMD-landState-high-2-1-h
UofMD-landState-low-2-1-h
High and low cases are described ( Land Use Harmonization (umd.edu)) as high and low data-driven reconstruction of land-use from HYDE and accompanying wood harvest, designed for model sensitivity studies.

The CABLE-POP team has a set of LUH2 processing scripts that could conceivably be used as the basis for creating land cover ancillaries (ACCESS-CM3), and the associated net transitions (ACCESS-ESM-1.5, ESM-1.6), and gross transitions (ESM-1.6, ESM3) irrespective of the choice of CABLE3 or CABLE4 (i.e. POP).

These scripts allow for compositing LUH2 states into CABLE land cover units (tiles) and spatial aggregation. These is also some level internal sanity checking (e.g. tiles sum to 1).

Notes from @inh599 email (16/4/24):

I/we have made a little further progress on the topic of understanding what is going on with LUC in the CABLE-POP runs. In short – a better understanding is emerging as to why some of the grid cells show different qualitative behaviour between CABLE-POP and ESM. This is emerging as a topic that is going to need careful consideration as the configuration of CABLE4 emerges. Jurgen and I will keep thinking/discussing as this will play an important role in TRENDY for 2024 and some of the current BIOS3 work.

More detail - In essence there is ambiguity around how to deal with transitions in/out of rangelands in the POPLUC context since POPLUC does not use the potential vegetation distribution that LUH2 assumes. From some emails that Vanessa, Peter B and Stephen Sitch had back in 2018

  • Following LUH2 simple guidelines (on their website): "all natural vegetation should be cleared for managed pasture, and only cleared for rangeland if it is forested”. “

This guidance resulted in a decision to discount (set to zero) any transition from primary (and secondary) non-forest vegetation to rangeland in the POP simulations.

The ambiguity arises because POPLUC assigns each grid cell an amount of primary forest (a hard-wired fraction of primary forest+primary non-forest vegetation) independently of LUH2. Indeed POPLUC can assign a grid cell some primary forest even if LUH2 says cover was all primary non-forest. What do you then do about the transitions from primary vegetation to rangeland? How do you keep your land cover states in step with the transitions?

The way it has been done at the moment leaves any primary forest in POP that was on land assigned as primary non-forest in LUH2 untouched – so is (possibly) unrealistic.

Since the ESM1.5 is based solely on the states (and changes in them over time) this issue around co-handling the mapping between land use states/transitions and the land cover states/transitions doesn’t apply in the same way. However there is the equivalent (parallel) problem of determining the initial (pre-industrial) land cover distribution in a way that is consistent with both LUH2 and the present day (MODIS informed) land cover.

This is the presentation I gave in April/May 2024 on the comparison of LUH2 and CABLE-POP

comparison_CABLE-POP_LUH2.pptx (7.6 MB)

Part 2 revealed a bug in the CABLE-POP code which prevents transitions to croplands and which leads to an almost constant crop fraction through time. This bug affected TRENDY version 9-12, but will be fixed for version 13.

1 Like

Adding some slightly reworked slides which hopefully explains some of the quirks that can emerge due to how we are processing LUH2 and rangelands for POPLUC (in TRENDY and BIOS).




Thanks for the updated slides @inh599 . I am not clear on your conclusions. Are you suggesting we use a different POPLUC code for offline and coupled? Or are you saying considering the requirements for coupled, we should change the pre-processing in all cases?

Current thinking (31/5/2024) is that many of the concerns noted are a distraction - @juergen please take note.

Most of this originated from an analysis over the Kimberley region based on recent land-use enable runs (S3) completed for RECCAP2. These show some cells where total land fraction (so the sum of all 3 POP land cover types) doesn’t equal 1 when it should - e.g. for 2010
Kimberley_S3_2010
and so I have been looking for an explanation/potential fix.

It is certainly the case that there is inconsistency between how the states and transitions have been aggregated - but that is not the explanation for the above. This is because

  • in this run/for these cells the total land fraction is constant through time (but /=1).
  • more generally I (re)discovered subroutine execute_luc_event in POPLUC which adjusts the states and transitions during a run to ensure conservation of land (i.e. even if you give POPLUC unphysical transitions it will retain plausibility at least).
  • my own analysis of the LUH2 post-processed data that I understand was used in the RECCAP2-S3 run does not show the same issue.

I’m now thinking there’s actually something weird in how this particular RECCAP2-S3 run was set up. The partner RECCAP2-S2 run (no land use change) does not have this problem.

Nevertheless - more generally - there remains the issue of inconsistency between the states and transitions as aggregated for POPLUC which will impact how runs that start at different years relate to each other. For example - primn → rangeland → secdf could conceivably lead to a mismatch in the grass fraction (The second transition is expecting to operate on land transferred from primn originally. However primn → rangeland is neglected when evaluating ptog and so inside POPLUC gtos which includes all of rangeland → secdf may get limited. The aggregated states wouldn’t reflect this limitation).

Overall (at the moment):

  • we should be using the same POPLUC science code for offline/coupled - but the pre-processing of the inputs may need to be different
  • we will need to build into the interface between POPLUC and the rest of CABLE (in CABLE4) new code that connects the states/transitions used in the rest of CABLE to those used by POPLUC. We were going to have to do this anyway.
  • It would be good to revise the pre-processing scripts (for offline) to ensure that the states/transitions used are fully consistent with each other.
  • we (perhaps) should consider running our offline runs (RECCAP/TRENDY) with a small amount of all 3 land-use types regardless of LUH2. This would address the problem that different patch indexes are used for the same land cover type in the output (which creates confusion when presenting stuff later - though this would be at the expense of more compute).
  • we need to follow through how %prim_only is being set and used.
  • we need ensure that BIOS-RECCAP2 is interpreting the transitions correctly.

on the prim_only variable:

when prim_only is true, we assume that there is primary vegetation throughout the whole simulation period. So LU transitions are not simulated and there is no secondary forest at all (relevant for POP).

we have 2 options on how to obtain this variable: the first one is to provide a ‘PrimOnly’ file as input that tells us how much primary vegetation has been lost. The code then sets prim_only according to this information. The second option is to let the CABLE calculate the same thing from the LUH2 input. It sets prim_only to false if there is any transition from primary vegetation (ptos or ptog. I think the second option is preferable as it would avoid having another input file.

But I also think that the way prim_only is calculated in the code is not quite right. It would ignore cases where primary vegetation has been cleared before the first year of LUH2 files (currently year 1580) as well as cases where some primary vegetation has been cleared before the first year but where there is no further loss of primary vegetation after that.

There is a line in the code that has been commented out, but I think it would be more correct to leave it in there:

IF (sum(tmpvec).gt.1e-3.OR. LUC_EXPT%primaryf(k).lt.0.99) LUC_EXPT%prim_only(k) = .FALSE.

so it additionally checks if primaryf < 0.99 (it probably should be 0.999)

@Juergen good news and bad news

The good news is that the LUH2 data that we use for Australia (v3h) does not have a conservation problem - manual checking indicates that all is okay.

The bad news - this is because the scripts used to generate the v3h generation data (on petrichor) included the transitions primn_to_rang and secdn_to_rang in ptog and stog respectively. It is the exclusion of these transitions that is the root cause of the potential problems.

The current (11/6/2024) position:

  1. We still have a potential for lack of conservation of land and a mismatch between the exogenous LUH2 states and the effective states in POPLUC generated by time-stepping the transitions. *
  2. Something else is going on to lead to the conservation problem in the RECCAP2 S3 runs.

*I will next look to check using the GCB2023 files.

My preference would be avoid having another input file that we have to keep consistent with all the others - for ACCESS I expect that we will end up setting %prim_only variable by noting whether the input ancillary specifies that some secondary forest will exist at some point at that grid cell (in any of the simulations) and, if so, allocate a small fraction from the start. This method is then consistent with that requirement - though we will need to choose the threshold 0.99 or 0.999 or 0.9999 accrodingly.

My bigger concern with this (other than the land conservation due to rangelands) is the setting of %prim_only = .TRUE. in READ_ClimateFile. This subroutine reads in the Climate (previously evaluated) and sets the PFT and biome distribution. It is also setting %prim_only = .TRUE. for IGBP biomes > 5.

The comment seems to indicate that this is aimed at preventing non-woody biomes from transitioning. This would prevent the use of primary non-forested land to secondary forest (why?) - highly plausible in some future climates. I suspect that this could also intersect with the transitions between different types of grass - crops, pasture, rangeland - and the harvesting of that grass (now = 0.5 for pasture and crops but 0 otherwise)*.

I would prefer that all this is done in the preparation of the LUH2 input files.

*This possibly explains why only %ptos, %ptog, %stog and %gtos are set to zero when %prim_only = .TRUE. in READ_LUH2 and not the assoicated transitions to/from different grass types - the current approach effectively allows for a transition between grass types if you know how to prepare the input files.