Code organisation for CABLE4 development

For CABLE4, we not only have to develop new code but we also need to merge two diverging versions of CABLE. It is worth discussing how we want to organise that work.

Below are my initial thoughts on this. Feedback is welcome.

Type of work needed

  1. Refactorisation: CABLE 3 started with an extensive refactorisation of the code. CABLE-POP_TRENDY is based on CABLE 2 so has a completely different organisation of the code.
  2. Merge of the two branches: the branches have diverged ~10 years ago. There will be both scientific and technical differences in the way both branches handle similar portions of the code. The merging can’t be automated as choices must be made on which version to keep.
  3. New development: to use POP-POPLUC in ACCESS, we need to make some changes to the code (writing/reading files per processor, PFT mapping etc.). These changes can only be done on a code with a working version of POP-POPLUC.

Aims for the work organisation

For me, the main aim should be to try and develop off the main branch as soon as possible. This would allow us to test the developments in the version of the code we are going to have in ACCESS3 from the start and not as a final phase. It should also facilitate keeping the development of ACCESS-ESM3 and ACCESS-ESM1.6 in sync with the offline development.

Proposal

Refactorisation
I have come to think that doing the refactorisation first would be best. This would allow to transfer new developments between main and CABLE-POP_TRENDY more easily by small blocks.

Parallel work on merging the branches and new development

The hope here is the merge will be done faster than the new developments so we can transition to doing all developments directly on main quicker and avoid last-minute surprises that might happen by doing the merge at the end.

To do both the merge and the new developments in parallel, we need 2 development branches:

POP-merge-dev: One for the merge based on main. Write protected so all new development would need a PR. All work for the merge would be done in branches that stem from that branch and would have PR based on CABLE4-dev.

CABLE4-dev: Based on CABLE-POP_TRENDY for the new developments around POP-POPLUC (after the refactorisation happened if possible). Write protected so all new development would need a PR. All new development for CABLE4 would be done in branches that stem from that branch and would have PR based on POP-merge-dev.

Once POP-merge-dev is done with, it is merged into main. Then we bring past developments from CABLE4-dev into main. These might require to be done manually. Hopefully, things won’t have diverged much so it won’t bring surprises.

The practicalities to accomplish this CABLE4 development is worth careful consideration. The proposed way forward would certainly work - however there are implied elements that make this highly ambitious (i.e. risky in terms of completion by early 2025). The time constraint is particularly concerning given a) the starting point of the various code bases, b) the status of ACCESS-CM3 and c) the general level of (research scientist) resourcing available.

Some notes:

  1. The aims of the work requires the development of the MAIN offline model (and specifically an offline-as-ACCESS configuration), an ESM1.6 and ACCESS3. The choice of going to an inline solution necessitates i) development of multiple interface layers (since the parent models are not equivalent and require different methods) and ii) testing within a coupled environment (which takes substantially more time).

  2. Any merge between CABLE-POP_TRENDY and MAIN will be a two way merge - implying routine testing of capability in at least two offline and two coupled environments. Test cases will need to be established for (at least) the capability of POP, POPLUC, C13 cycle, BLAZE(?), coordination hypothesis, mesophyll conductance and ground water, across a combination of offline GSWP3, offline TRENDY, ACCESS-ESM1.x and ACCESS3x applications.

  3. The two way merge would be both technical and science - and it’s likely that neither MAIN nor CABLE-POP_TRENDY will be the truth for all decisions (implying that the test cases themselves will evolve over time, and necessitating testing both ways). If nothing else this means that the scientists will need to be deeply involved the discussion/process at all times.

  4. From a practical perspective any merge will have to also consider the offline driver routines and MPI - substantive (multi-month) challenges in their own right.

  5. Code Reviews will definitely be needed - this slows things down, if only because multiple people need to be on top of the purpose of each bit of work.

  6. Our history of this kind of work indicates that this always takes a lot longer than initially thought (e.g. for CABLE3 into ESM1.5 there are differences in performance that we still do not understand why they have occurred nearly a year later) - and there is always a difficult challenge around determining when ‘close is good enough’.

  7. (I find that) Parallel work always fails to produce the hoped for speed up (decision making becomes more complicated, conflicts and distractions occur)

Having articulated this (sub)set of concerns though - it is not obvious what an alternate approach would be. This is because development of CABLE4 needs to be done from a codebase that can run POP and POPLUC.

However - noting that CABLE4 does not require CABLE-POP_TRENDY to be merged into MAIN nor be refactored - perhaps a way forward is to focus on build new developments onto CABLE-POP_TRENDY, and then a one way merge (new CABLE-POP_TRENDYMAIN), essentially leaving CABLE-POP_TRENDY unaltered, i.e.

  1. Essentially CABLE4-dev above - build new new developments around POP and POPLUC (i.e. configuration matching, processor-by-processor input) into CABLE-POP-TRENDY. Routine testing against CABLE-POP-TRENDY POP on/off.

  2. MAIN-dev - based off MAIN, pull only those elements of CABLE4-dev needed to enable POP and POPLUC into (a branch off) MAIN. Routine testing is done against MAIN and CABLE-POP-TRENDY without POP turned on, where land cover is equivalent to default POP land cover [a comparison between MAIN and CABLE-POP_TRENDY in this way is a necessary early test to see how important differences in the biophysics are], and against CABLE4-dev (POP on/off).
    Document instances where MAIN has been preferred over CABLE-POP_TRENDY.
    Note that the CABLE3 refactorisation, and other developments, imply that bitwise equivalence will not occur even if land cover is forced to be equivalent - so care will be needed when setting up test metrics.

  3. Alter land cover to be consistent with ACCESS - test.

  4. Move to tackle issues related to functioning in the coupled model(s) - variable allocation, offline spin up, pre-run splitting of POPs initial conditions, interface layers.

First thought: when do you tackle the refactorisation of CABLE-POP_TRENDY. Because MAIN_dev becomes a headache if, for every change, we want to bring across, we have to also remap it from one code organisation to the other.