ACCESS-ESM1.6 development

@MartinDix Could I please post your last two replies here as comments to the issue: Use the CMIP7 piControl value for the solar constant · Issue #7 · ACCESS-Community-Hub/access-esm1.6-dev-experiments · GitHub ?

Pearse mentioned updated ocean model parameters in todays standup. Are they going to modify targets I’m already pointing to in my payu/config.yaml or do I need to update my config package somehow?

@Jhan I moved your post here - I hope you don’t mind. My understanding is that Pearse was referring to updating WOMBAT parameter values in the MOM field_table. @pearseb is that right? If so, this will not be reflected automatically in your config @Jhan. We’ll need to update your config to get the changes.

I hope that this is the case. Nonetheless it will still be better to get hold of those parameters.

Minutes from today’s stand-up here. Please correct / amend.

@paulleopardi I missed your comment on the forcings, can you please add it? Similarly @clairecarouge, would you like to clarify on iLamb?

@tiloz organised a meeting to decide on the new configuration for the ESM1.6 spin up.

Attendees: Tilo, Ian, Jhan, Claire, Pearse, Matt, Spencer (shout out if I forgot someone)

We have decided to use the following setup for the spinup:

  • code bases from the Marchepot testcase, from Jhan
  • ocean and sea-ice config from Pearse latest run without burial
  • add CMIP7 solar constant from Martin’s test
  • CABLE namelist from the Marchepot testcase
  • ocean restart: from the end of the non-burial config case at year 916
  • UM restart: from year 30 of Marchepot testcase
  • restart simulation time to “0”: need to update the timestamps in all the restart files
  • run on SapphireRapids
  • Use latest ifort compiler

Spencer will put together a config in a PR on access-esm1.6-configs and get reviews from Jhan and Pearse (and anybody else needed) to ensure we get all the pieces together correctly.

This spinup config will be merged into the dev-preindustrial+concentrations branch once it is ready to serve as the base for further developments.

Please let us know if you see anything else we might have forgotten or if you need to be involved in reviewing or setting up the config.

Hi @pearseb, as discussed, friendly reminder, can you please update this post. (As well as the accompanying github repo.) Feel free to reach out to @spencerwong, @dougiesquire or myself if anything is unclear.

Thanks @clairecarouge for documenting the proposed changes. Not sure if we can set the simulation time to 0 after the restart. There is a note in our script based version that states that the real start year must be greater than 2 to work. Not sure why. We always used a real start year of 101 when we did not continue the time from the previous run (although this does lead to confusion about the total length of the run).

Also re-posting the link to @MartinDix analysis of the 100 year test run performed recently with the new solar constant:

Year zero is not a valid value for the Gregorian or Julian Calendars.

I’d guess it has something to do with the coupler restarts containing data from previous time steps. If year zero is indeed not valid, and you start at midnight of January 1st some of your coupling data will be from the previous year. If you start at year 1 then that would mean December of a “year zero”.

As I say, just a hunch.

Thanks @Aidan. That makes sense.

1 Like

For information, if you want to follow the development of the next spinup configuration, please read: Configuration for next spinup simulations · Issue #85 · ACCESS-NRI/access-esm1.6-configs · GitHub

Use this issue on access-esm1.6-configs repository to discuss further details about the configuration.

We have just realised the stand-up calendar event had an end date of last week and as such the stand-up had disappeared from the calendars. I want to confirm there is a ESM1.6 stand-up tomorrow at 9.45am.

I have updated the end date on the calendar event series to 1st July and you should have received a new invite to get back our weekly meeting in your calendars.

1 Like

Meeting minutes for today are here. Please correct / amend.

2 Likes

Hi everyone, the updates for the next spinup configuration are now complete and have been added to the ESM1.6 configurations repository.

Cloning the configuration

We’ve added a tag to the repository, 20250409-spinup-dev-preindustrial+concentrations, which points to the new spinup configuration. Using a tag lets us “freeze” the spinup configuration, allowing development to continue on the dev-preindustrial+concentrations without affecting the spinup configuration.

Note that the payu command for cloning a configuration based on a tag is slightly different to the command for cloning from a branch. To clone the spinup configuration from the new tag, use:

payu clone https://github.com/ACCESS-NRI/access-esm1.6-configs -s 20250409-spinup-dev-preindustrial+concentrations -b <new-branch-name>  <new-directory-name>

Here, it’s worth using a descriptive branch name for <new-branch-name>, e.g. 20250410-spinup.

Configuration details

Details on the changes are available in this issue, but to summarise:

  • Use the CMIP7 solar constant.
  • Update the WOMBAT-lite parameters to more recent ones used by @pearseb.
  • Use the Sapphire Rapids nodes.
  • Use an ocean restart from late in @pearseb’s previous spinup, and an atmosphere/land restart from year 30 of @Jhan’s MarchEpot simulation.
  • Use the latest WOMBAT-lite and UM7 code.

Many thanks to @dougiesquire, @pearseb, @manodeep, and @MartinDix for their work on these configuration changes.

3 Likes

Thanks @spencerwong and everyone for getting the new configuration finalised. @Jhan has set up a number of new runs based on this. I noticed that the runtime and resource use is not that different to what we had with ACCESS-ESM1.5 on the older cascade lake nodes (~1KSU, ~1:16 walltime ). With the ESM1.6 on the sapphire rapids we also get about ~1KSU and ~1:12 walltime. I thought we would be running faster on the new nodes with a wall time closer to 1 hour? Did we lose this speed up due to the changes in the post processing of the ocean output which I think is now included in the main model run? Should we reconsider this?

Hi @tiloz - Thanks for noting the runtime performance. My testing showed that the PI config was about ~25% faster on sapphirerapids (1h8m on SPR vs 1h38m on cascadelake). The AMIP config was about the same runtime on both queues (~1h) but with ~20% lower SU cost (since we are using 208 cores on SPR vs 240 cores on CCL).

The first suite of runs did have a runtime of 1h1m on the SPR queue but that was before a some bugfixes, and the post-processing changes were brought in.

Just to cross-check can you or @Jhan run the same config on both cascadelake and sapphirerapids and see what the runtimes are?

Thanks @manodeep. @Jhan would you be able to do this quick test? Just one year would be fine. Thank you.

Can you point us to the control directories?

IF I just edit the PBS queue is that enough? Isnt there also the processor layout to change?

An update on the two ESM1.6 configs to run on the sapphirerapids queue by default:

As per my testing, the chosen preindustrial+concentrations config is ~25% faster (1h8m vs 1h31m on cascadelake) with a 20% lower SU cost (~937 SUs vs 1163 SUs). This translates to over 20 simulation years/compute-day after switching to the newer sapphirerapids queue on gadi.

The chosen amip config is about the same wallclock-time (1h1min vs 1h) but with ~10% lower SU cost (420 vs 478). If a higher walltime is acceptable (1.5hrs), then the SU cost can be reduced significantly to 317 SUs (from 478 SUs on cascadelake) by running on a single sapphirerapids node with 104 cores. Further details are all captured in this GitHub issue.

@Spencer Wong has kindly swapped over the configs in these two PRs:

The feedback we would really appreciate is

  • test out the new configs and see that they work for you (and report back if you find any issues, especially with performance or deterministic outputs)
  • whether we should choose the 104 core amip config as the default one (increased wallclock time but significantly reduced SUs)

Many thanks in advance,
Spencer and Manodeep