Developing a CABLE4 work plan: Issues to consider

This topic captures issues that need addressing in planning CABLE4 coding. Issues could be summarised here as a list of dot points, with further information in separate posts.

In this plan, the development of CABLE4 happens in offline first.

2 code branches problem

CABLE4 will be the merge of two branches of CABLE: main and CABLE-POP_TRENDY. It will also include additional developments. We need to decide when the merge happens and where the developments should be done. Ideally, we do not want to copy new developments to several branches.
Considerations:

  • who decides on science changes for the merge?
  • Impact on ESM1.6. Is bringing science from CABLE-POP_TRENDY branch in CABLE main desirable for ESM1.6?
  • Files reorganisation. CABLE-POP_TRENDY is using an old file organisation, should we refactor the code in that branch first? It is needed before doing the merge and it simplifies things if we need to keep 2 branches for some time.
  • Finish BIOS merge first!
  • Testing environment:
    • POP and biophysics and SLI(?) cases for code coverage of bitwise-comparison when restructuring.

Vegetation distribution

  • change PFT distribution we are using to have only one woody vegetation per gridcell. Need to decide how the code enforces this. Do we only consider the woody PFT with the highest fraction in a grid cell? Do we replace the whole area covered by woody PFTs with this one PFT?
  • update the vegetation distribution map to use the same in offline and ACCESS
    • settle on the source of vegetation map.

We would move to have gridinfo files at the resolution we want to run and the preparation of the vegetation map then becomes a pre-processing step.

  • change POP and POPLUC to work with an exogenous vegetation distribution instead of one that is calculated within CABLE. Implement switch to choose between BIOME1 or external distribution.
    • BIOME1 information handling: How to get the woody fraction if reading an external vegetation map and not doing do_climate?
      • Need to check the edge cases in the code to see when the biome variable (in luc type) is used.
    • Any other information that depends on do_climate? Only iveg and biome are read from the climate restart

POP

This includes work on POP itself.

  • conversion of POP state variable from being carbon stocks in each patch/cohort/age class to proportion of total carbon stock in each patch/cohort/age class (will allow easier matching between CASA an POP). Attractive but not necessary, includes risk in doing it.
  • creating a POP namelist file should help in reducing the number of hard-coded constants and make experiments easier wrt modifying number of patches and cohorts
  • reviewing settings and flags in the POP module. What flags/configurations do we need to keep?
  • Tidy up interface CASA / POP: only pass information that is needed.

LUC at the start of year

Because we want the land use information in the output file to be coherent with the other variables output, the LUC calculations should be done at the beginning of year instead of end of year. See slides

Need a better understanding of the output in the coupled model. Tile fraction information of online model is consistent with the land cover that was being used.
How does this work for JULES?

Will POP breaks if it is called on the very first step of a run without information to bring in POP? What climate variables POP depends on? Needs to know the NPP for the previous year and potential stem NPP: could we set these to 0?

Initialisation of land cover.

Do we want it called at the start or the end of the first time step?

CABLE offline working with 27 PFTs

  • remove hard-wired vegetation-type and soil-type numbers
    • Some are numbers, some are ranges that assume a specific ordering of types.
  • Add handling of two tiles for the same type for primary and secondary vegetation.
    • Need to decide how to implement this: do we have 2 indexes for the same type? Do we have 2 tiles with the same PFT number and a flag to differentiate primary and secondary?
  • what do we want the tiled output to be like? Do we want to output the primary and secondary fractions separately or together?
  • Rethink how the tiles are stored? Store all tiles for all gridcells instead of the stric minimum?

Mapping of vegetation types between CABLE (27) and POP (3)

See this post for Ian’s slides on the topic

  • reorganise code to run LUC at the start of the year and not the end of the year.
  • how to handle cells with no vegetation (lake, ice, bare soil, urban)
  • How to map out the 27 tiles to 3 tiles? How to identify secondary/primary? How to aggregate and disaggregate non-woody, vegetated types?
  • How to disaggregate rates?
  • Deal with fractional vegetated gridcells.

Analytic spinup

  • currently not working with phosphorus. It should speed up the spinup significantly if we can get it working.

Analytic solution is wrong when the phosphorus is on (and maybe with nitrogen as well). The fast spinup brings the model to a different “steady” state as running the whole model over long period. The spinup does not bring you closer.

Solution: tinker on how many steps and when we do each step.
Talk to Anna U. about what is going on.

Run a few tests to show what the problem is.
How big of an issue is it going to be for ESM3? Spin up of the ocean is going to be slower anyway.
Having a CABLE-as-ACCESS offline run working is higher priority and can simplify this whole question.

Low priority on the list.

BLAZE

It would be ready to implement, at least in offline.

  • How to handle them so we don’t completely remove the feature from the offline code, even if not going into the coupled code.
    It will come in part of the merge of BIOS to POP-TRENDY to main.

CROP

  • Is CROP in a different branch?
  • How to handle them so we don’t completely remove the feature from the offline code, even if not going into the coupled code.
    Not ready for main. It will stay in the branch as a source of science for future potential development.

Efficient parallelization for CABLE4

This is already in progress.

  • review the entire implementation of MPI to ensure we have efficient parallelization capacity.
  • considerations of future development (river routing, lateral flow…)

Testing environment

Do we need one testing environment or do we need different testing for different tasks?

Need a testing configuration with CASA. How do we deal with the restart? In which cases can we start from an old spinup state and in which cases do we need to do a full run.

Interpretation of LUH2.

Consistency issue. Issue with grid cells that are part water/non-vegetated and part vegetated.

Carbon conservation

Pb with thresholds that do not update the fluxes. Do we do something about it? Develop a check on conservation, similar to water and energy balance.

Timelines:
Unlikely to start working on this before January 2025
Formal spinups not starting before 2026.

I’m replying instead of editing the first post, this way I can put my ideas in whatever order and use the first post to create the plan once we have more input.

Reorganise CABLE-POP_TRENDY directories and files

It needs to follow the same directory and file organisation as the main branch.

27 tiles

  • remove hard-wired vegetation-type numbers
  • figure out how to handle primary and secondary tiles of the same type:
    • where do we need to distinguish and where does it not matter?
    • if the code checks on a specific veg. type that can be primary and secondary (IF condition), we need a way to check against both vegetation indexes.
  • reprocess ancillary data for 27 vegetation types for offline and online

Work done in main or CABLE-POP_TRENDY. The choice of branch will be dictated by other work or considerations.

Tile mapping from CABLE to POP-POPLUC

  • From CABLE to POP-POPLUC: concatenate the grassy types together.

    • what happens to the non-vegetated types? Does POP assume primary+secondary+grass cover the whole grid cell area? Or does POP carry an area fraction for each tile and never use the fact these fractions add up to 1?
  • From POP-POPLUC to CABLE:

    • that’s complicated. See Ian’s documentation.

Work done in CABLE-POP_TRENDY? To be able to compare to current simulations with POP?

Working MPI implementation with POP

I’m not sure when we will need to be able to run a large configuration with POP.

Analytic spinup for phosphorus

Apparently this is broken and could lead to big gains in speed for the spinup.

Testcase: running TRENDY configuration with main?

Is that useful to have?

Testcase: idealised testcases

  • Single point testcase(s) with one type with primary and secondary to test implementation in the 27 tiles work, no POP and no LUC: should both tiles give the same results? Or is CASA different on secondary vegetation? Could be an array of single points to test for different climatic conditions.
  • Testcase with woody + grass + non-vegetated, with LUC: test that the non-vegetated area size is maintained.

(No regrets) Elements of a work plan:

  • removal of hard-wired indices throughout the CASA, POP etc. code
  • conversion of POP state variable from being carbon stocks in each patch/cohort/age class to proportion of total carbon stock in each patch/cohort/age class (will allow easier matching between CASA an POP)
  • design of technical method to connect between 17/27 tiles in CASA and 3 POPLUC tiles
    ** we will need to have 3-tile CASA TYPES either as work space or formally carried around
    ** we will need to have some generic way of identifying which CASA17 tiles get mapped to CASA3 and POP3 (and vice-versa) when in mp-vectors - likely requires new a TYPE.
    ** how to handle cells with no vegetation (lake, ice, bare soil, urban)
    ** streamlining the cable%climate TYPE
  • BLAZE (how to handle in the interim)
  • Add to POP_TRENDY a capability to read in externally provided land cover/transition/Biome-1 dependent inputs to avoid the need to co-run with the BIOME-1 model.

For ACCESS

  • some form of automated namelist creation at the node/processor level (we will need some restart files to be identified at the minimum)
1 Like

small additions:

  • creating a POP namelist file should help in reducing the number of hard-coded constants and make experiments easier wrt modifying number of patches and cohorts

Another no regrets element to consider

  • rationalizing/simplifying the climate% TYPE

At the moment this is all encompassing and involves things like 20 year running means. I would not be surprised if we can’t handle these requirements in different (more efficient) ways without much impact on performance.

Slides from meeting on 11/7/2024

intro slide including a set of assertions about the current (baseline) condition of the offline and coupled models. A key challenge for CABLE4 and ESM3 is the desire to relax POP’s 3 tile configuration when inside ACCESS, without having to undertake a fundamental rewrite.

Box-stick diagram of the current offline POP-enabled code

Box stick diagram of proposed CABLE4 - note boxes with dashed line would require some technical modification or are new routines. Note the key move of POP and POPLUC from the end of year to beginning of year.

option for implementation within ESM3 - largely follows the current implementation within ESM1.5 and CM2. This method would require a mismatch between the albedo seen by the atmosphere and the albedo of the land during the first hour of each year.

Alternative for implementation within ESM3 - the key difference being that the carbon cycle (and if possible soil hydrology) is moved to extras. This move would better align CABLE-CASA-POP with the structure of JULES. This method would still require a mismatch between the albedo seen by the atmosphere and the albedo of the land for some of the first hour of each year.

Five broad technical tasks that would be necessary for this implementation method - note that the detail of the science of the interface is largely hidden inside task 4.

Is it worth doing the tile mapping?

It takes time to get it right, at the same time it isn’t rocket science. Is it worth spending the time on it?

It does not need to be done at the start. We can do some of the work with 3 tiles only first and do the mapping some time later.

Following our CABLE4 design meeting, I have cleaned up the issues and looked into creating a roadmap.

For the roadmap, I’ve tried 2 tools: Zenhub and GitHub Projects. I would like to go with Zenhub. It was a lot smoother to work with and, although the interface is busier than Github Projects, I found the resulting roadmap clearer. GitHub Projects made me mad because you have to set dates for every issue, while Zenhub has the notion of groups of issues (called epics) and the dates are at the epic level.

Here is a first draft of a roadmap for CABLE4, and the BIOS3 project since we need to wait for this to finish for some of the tasks. I’ve tried to pick somewhat realistic dates but it’s open to modifications, I know I tend to be optimistic!

For direct access to the roadmap (absolutely not required), you need to sign up to Zenhub (free), it’s recommended to sign up with your GitHub credentials. Then I can add you to the project.