Building the UM to run on Gadi's Sapphire nodes

Hello all.

There have been some mentions about building CMIP7 ESM to run on Gadi’s sapphire nodes, e.g. CSIRO - ACCESS-NRI standup minutes - #20 by clairecarouge

Has anyone successfully achieved it?

I ran the pre-built UM executable on the normalsr queue and performance was significantly worse than the normal queue. So I’d like to compile the UM for sapphire nodes using the standard UM workflow.

The NCI documentation here : Sapphire Rapids Compute Nodes - NCI Help - Opus - NCI Confluence, suggests to use the following compilation flags:

module load intel-compiler-llvm/
icx -O3 -march=broadwell -axSKYLAKE-AVX512,CASCADELAKE,SAPPHIRERAPIDS myCode.c -o myBinary

and testing

-qopt-zmm-usage=high

I’ll have to change the module load intel-compiler/2021.5.0 statement in the {% macro UM_ENV() %} in /site/nci-gadi/suite-adds.rc to use the latest intel compiler (and is this recommended?)

There is this (empty) entry in app/fcm_make_um/rose-app.conf:

fcflags_overrides=

Thefcm-make2.cfg file which is auto generated by fcm_make_um contains the following build flags

build.prop{class, fc.flags} = -i8 -r8 -mcmodel=medium  -std08  -g  -traceback                -assume nosource_include -O2 -fp-model precise -qopenmp 

I could manually modify the above entry in fcm-make2.cfg and then re-trigger recompilation but I assume there are more elegant options?

Ok I’ve got this running. I amended fcflags_overrides to

fcflags_overrides=-march=broadwell -axSKYLAKE-AVX512,CASCADELAKE,SAPPHIRERAPIDS

which updates the fcm-make2.cfg fc_flags entry to

build.prop{class, fc.flags} = -i8 -r8 -mcmodel=medium              -std08                    -g                        -traceback                -assume nosource_include -O2 -f
p-model precise -qopenmp      -march=broadwell -axSKYLAKE-AVX512,CASCADELAKE,SAPPHIRERAPIDS

So the fcflags_overrides amends the existing Fortran compiler flags, it doesn’t overwrite them.

Some output from the fcm-make2.log:

[info] shell(0  0.6) mpif90 -oo/reconfigure.o -c -I./include -i8 -r8 -mcmodel=medium -std08 -g -traceback -assume nosource_include -O2 -fp-model precise -qopenmp -march=broadwell -axSKYLAKE-AVX512,CASCADELAKE,SAPPHIRERAPIDS /home/548/pag548/cylc-run/u-dq126/share/fcm_make_um/preprocess-recon/src/um/src/utility/qxreconf/reconfigure.F90

I’ll now run with this executable and let you know how I go. This was using the default Intel compilers specified in ACCESS-rAM3 (intel-compiler/2021.5.01)

1 Like

@Paul.Gregory good work. For your information, I don’t think anyone has tried building and running the UM on the Sapphire Rapids nodes using the standard UM workflow within the suite. But it has been done using spack for ESM1.6 (old UM version) at least.

OK I found the executable compiled with

fcflags_overrides=-march=sapphirerapids

provided useful speedup on the normalsr queue. This executable was compiled with

module load intel-compiler-llvm/2025.0.4

Better results were generated by adding

fcflags_overrides=-march=sapphirerapids -qopt-zmm-usage=high
1 Like

Some updates.

Have been talking with Ben Menadue and others at NCI about the best way to compile source for the sapphire rapids nodes, and how to build executables to run on both broadwell nodes (normal queue) and sapphire rapids nodes (normalsr queue). Their comments

The Intel compiler can generate runtime dispatch for different architectures, but it has a very slight performance impact. So I’d say it’s better than using the lowest-common denominator architecture but not quite as good as a separate build for each

  • -march=broadwell means “use the instruction sets that are in the Broadwell architecture”
  • -xBROADWELL means “optimised specifically for Intel Broadwell architecture”
  • -mauto-arch=a,b,c means “generate additional code using the instruction sets that are in these architectures”
  • -axA,B,C means “generate optimised code specifically for these Intel architectures”

For the best performance you would use -x.

I’m found that UM source built with-xSAPPHIRERAPIDS fails with a segmentation fault in the radiance calculation.

Ben’s suggestions for chasing down this error.

I’m guessing you’re seeing it when using -xSAPPHIRERAPIDS instead of -march because it’s optimised the code differently. The first thing I would try is to ensure your stack size is unlimited. You may need to wrap your binary in a script that sets that before invoking your binary as I’m not sure how reliable copying those settings from the first node of the job is (some MPI libraries do it, others don’t).

Otherwise, if you still see that, I’d suggest recompiling with the -check bounds (at least) flag to ensure there’s not an accidental off-by-one error or similar there.

Other comments about the default UM compile flags.

As an aside, the -i8 and -r8 flags change the ABI of the compiler such that the generated code will not be compatible with the standard Linux ABI anymore. This means you cannot use our MPI / NetCDF / etc libraries without having something in the middle that translates between them. For example, our MPI libraries expect 4-byte integers for most arguments, and the MPI_REAL datatype corresponds to a single-precision floating-point number. A much safer approach is to use the standard ABI and explicitly designate the size of variables where needed (i.e. if you need a 64-bit integer use INTEGER(kind=INT64) (the INT64 parameter is in the ISO_FORTRAN_ENV module).

Also, the “-mcmodel medium” flag changes the addressing model to never use RIP-relative addressing for objects. This is only needed if your program has more than 2GB of static objects (e.g. COMMON blocks, or module-level or SAVE variables), and results in larger and less efficient generated code. Are you sure you need this? Note that automatic or allocatable objects do not count towards this limit.

Just to chime in into the use of -r8 and -i8 those are in general “bad practices” that have kept along many Fortran codebases, they’ll be difficult to get rid off, sadly.
For -mcmodel=medium, it might be something that “someone added this flag 5 years ago” and no one has really bothered with checking if it is actually needed.

Sharing these with the community, in case anyone has some comments/experience.

2 Likes