I would like to follow up on an interesting presentation by @spencerwong and @manodeep where they mentioned that new compiler optimisation flags have helped to speed up ESM1.6 and ESM1.5 by a substantial amount.
Would it be possible to please point me to which optimisation flags were used to compile MOM5 in this faster configuration? The reason being that I’m interested to use these tricks for my GFDL model which is built off the same codebase. help
The speedups that we have gotten have primarily come from switching over to the sapphirerapids queue, and improving the load-balancing between the ocean and atmosphere. While I have tested a few compiler flags, nothing has given a consistent speedup in MOM5. @spencerwong can chime in too - my memory is not exactly a reliable narrator
@dkhutch Are you specifically looking for MOM5 speedup, or are you interested in ESM1.5/ESM1.6 performance improvement (say for the pre-industrial config)?
Thanks Manodeep. Ok good to know you haven’t necessarily got MOM5 to run faster. For the GFDL coupled model, yes it’s just about getting the MOM5 compiler optimisation to the best settings. It might still be worth my checking what is the latest settings, because I’ve tended to lag behind by several years (e.g. I don’t know if I should be using a OneAPI compiler or even how I should set that up).
In regards to ESM1.5, I would certainly be interested to know if there are ways of updating existing runs (which are using the ACCESS-NRI supported releases) to go a bit faster. It was mentioned that in the notes that ESM1.5 could speed up from ~65 min to ~58 min, and SUs reduce from 950 to 810 / year. I would love to take advantage of that, even for runs that are already in progress because my runs take a really long time and as long as I document the changes, I see no issues with switching over mid-way through them.
We are currently working on detailed optimisation for ESM1.6, and planning to backport the improvements to a new 1.5 release when the ESM1.6 work is done. That work does include switching over to the latest oneAPI compiler, and more recent versions of software dependencies (including OpenMPI), swapping over to the sapphirerapids queue and changing some of the parameters.
For example, one of the best-case throughput I have seen for ESM1.5 is ~26 years/wall-day on the sapphirerapids queue (4 nodes, 55 mins wall-time & 775 SU cost for a 1-year run). Would it be useful to you for me to list the specific config changes that boosts performance (with the caveat that what holds for the oneAPI compiled binary may not hold for the released exe compiled with classic Intel)?
There are four sets of changes for ESM1.5 PI-config - i) config.yaml to change the queue, the cpu partitioning, and an MPI parameter that seems to boost performance, ii) atmosphere/um_env.yaml for the UM cores and layout, iii) to atmosphere/namelists to change segment sizes, and iv) ocean/input.nml for the ocean layout
config.yaml changes
Add the following to swap over to sapphirerapids, and specify what each node on the sapphirerapids queue contains
Hi @manodeep@spencerwong ,
I can confirm that with these changes, my ACCESS-ESM1.5 run can now complete 1 year in ~57 min, costing 789 SUs. This is a really nice improvement as the run would previously take more like 64-65 min and cost 930-960 SUs per year.
Really great to be able to take advantage of these changes!!
Hmmm… I suspect this is a system problem on Gadi but in the last 24 hours I’ve had all my jobs time out unexpectedly multiple times. Even with time limits of 2:00, jobs are failing to complete (when they should take more like 1 hour!). This is affecting both normal and normalsr jobs. I can’t make sense of any of it. Wondering if Gadi is having a bad couple of days.
Thanks David - yes, something happened on Gadi at the end of the week and performance was down nearly 2x. Thankfully, performance seems to be back to normal now.