POP implementation in ACCESS-ESM3: inline vs separate executable

clairecarouge · 12 February 2024 22:55

This topic summarises the outcome of a meeting on 1st February with @inh599 @MartinDix @clairecarouge @Jhan. This meeting aimed to decide on the implementation route of POP for ACCESS-ESM: either inline within CABLE or in a separate executable.

TL;DR: The inline solution for POP has been chosen. It was deemed easier to develop, more likely to be delivered on time and leaving open possibilities for other scientific applications.
The problem of sustainability and maintainability in the future was discussed. It was decided there was too much uncertainty around the coupling requirements between the land and LFRic to base any decision on this aspect.

We considered the following criteria to reach our conclusion:

Memory question for the inline solution

A possible aspect preventing the inline solution is the memory requirements for POP.
There are very few non-state variables in POP so we based some quick estimates of the memory requirements on restart file sizes. We have the following 2 examples:

the TRENDY_v11 global runs that I’ve looked at had 15069 gridcells, 33281 tiles (so less than 3*) and 18265 POP instances. The restart file sizes are 3.6GB for POP, 232MB for POLUC, 108MB for the climate TYPE.
a recent BIOS (Australia) run had 11007 gridcells, 25213 tiles, and 14329 POP instances. The restart files sizes are 2.8GB for POP, 170MB for POPLUC, 73MB for climate and smaller for CABLE and CASA.

If we assume a maximum of 100 gridcells per processor in ACCESS-ESM and 2 instances of POP per gridcell then we are looking at around 40MB for the POP restart per processor, 1.5MB for a POPLUC restart and 1MB for a climate% restart. The memory would be slightly larger than that but not by a massive amount.

This indicates adding POP and POPLUC inline is possible from the point of view of the memory allocation.

Ease of development, ability to deliver on time

Another point of discussion was the ability to deliver on time and the development requirements considering the current technical and scientific resources. On this point, the inline solution is a much more straightforward solution, hence requiring potentially less development work.

However, the inline solution requires a different coupling in ESM 1.5 and ESM 3. This is already the case for all the CABLE variables and any additional passing of variables for POP will need to follow the different ways to do it. Considering these variables would mimic what is done for CABLE, we can figure out the requirements in one system and potentially easily adapt to the other system.

Additionally, the inline solution requires writing files split per processor (because they are written out at the CABLE level and not from the UM). This means we might need some scripts to combine the output files at the end of a run. We might also need to split the POP inputs per processor to read them from CABLE directly instead of threading them through the UM.

Does the solution make it more difficult to research other science questions?

We want a model that allows us to research a variety of science questions. The advantage of the inline solution is we can more easily modify it to have a more frequent interaction with POP. This might be useful eventually for the fire model, BLAZE.

Future-proof solution for ACCESS

We discussed the longevity and although there are signs towards coupling the land to the atmosphere via a coupler in the future, nothing is clear yet. So we decided that the longevity question was too uncertain to have a lot of weight in the decision. We consider the coupling might have to be redone anyway, no matter what the chosen solution is.

We could consider moving POP to its own separate library if we think separating the two will be necessary going forward. This is likely to require very little additional technical work. It would split POP and POPLUC development into its own repository and make it possible (although not necessarily easy) to interface POP and POPLUC with other land surface models.

Maintainability

For the maintainability, the inline solution seems the easiest. The TRENDY developers have a range of tools to use with CABLE-POP with POP inline. It would be a lot of work to change all of these to a 2-executable system and not something we could prioritise as part of the CMIP7 preparation. This means we would have to maintain 2 parallel versions of the same code.
With the inline solution, we might still need 2 versions to start with but it is potentially easier and faster to converge onto one version.

Additionally, the 2-executable solution would require the maintenance of at least one stand-alone code (mapping of the PFTs). This is a low inconvenience as CABLE-POP itself, even offline, requires the creation of a CABLE tool package. In the inline solution, the PFT mapping would be done within CABLE.

clairecarouge · 12 February 2024 22:56

@inh599 @MartinDix @Jhan In the post above I tried to summarise our meeting about POP inline or 2 executables. I’ve made a post a wiki, don’t hesitate to edit if I missed something or mischaracterized anything.

Jhan · 14 February 2024 03:26

Implication of impact on memory in the case of an inline CABLE4. Considering

In the first instance I decided on implementation at the top_level in the UM timestep (@ atm_step). I dont see any reason that I couldn’t declare them one level higher, ensuring the scope of the variable persists for the whole experiment. A portion of these 100K fields could (in principle) even be pushed back to D1/STASH. The main reason I declared

REAL :: testCABLE4(100000, land_field, 5)

At the top_level was to show that it could be done. If this can be done then declaring them automatically in CABLE can be done. But not necessarily the other way around.

From job.out (PBS standard out, effectively a UM runtime log about the job just run)

Typical

NCPUs Requested: 576 NCPUs Used: 576
CPU Time Used: 125:26:17
Memory Requested: 1.12TB Memory Used: 334.51GB

After adding to the source code at atm_step

NCPUs Requested: 576 NCPUs Used: 576
CPU Time Used: 46:41:02
Memory Requested: 1.12TB Memory Used: 349.93GB

As you can see, the 15 Gb extra still doesn’t take us anywhere close to the 1.12Tb that has already been allocated to us for this run.

Performance once the additional data is passed etc may slow things down, but I dont think it will be that dramatic.

inh599 · 14 February 2024 06:17

An additional topic (that potentially pushes us back towards 2-executables) - spin up.

A CABLE-POP simulation will typically spin up carbon pools and plant demography from zero - this will not be possible inline as the atmospheric model will have problems with very small canopy heights etc.

A possible way forward would be to use our planned ‘CABLE-offline at ACCESS resolution’ capability as a means to spin up the demography under the ACCESS-derived pre-industrial control conditions and use that demography to initiate POP and POPLUC inline. This would require a) some preexisting ACCESS output and b) a method to link the restart(s) created in the ‘CABLE-offline-at ACCESS resolution’ setting to the POP restart(s) to be used when coupled. I expect that the ACCESS output would only need to be plausible (appropriate variables etc.) and not precise.

Note that this does not remove the need for a long spin up - as that is needed to satisfy the coupling between the land and atmospheric components (and will be needed for the ocean biochemistry).

Topic		Replies	Views
CABLE4 development Experiments	0	61	8 March 2024
CABLE4 planning: meeting notes CABLE cable4-planning	68	1027	26 June 2025
CABLE-POP runs - global parallel setup (without MPI) and sensitivity tests CABLE cable4-planning	7	275	15 February 2024
Developing a CABLE4 work plan: Issues to consider CABLE cable4-planning	7	108	13 September 2024
Run CABLE-POP at ACCESS Resolution CABLE cable4-planning	26	420	29 August 2024