I ran the pre-built UM executable on the normalsr queue and performance was significantly worse than the normal queue. So I’d like to compile the UM for sapphire nodes using the standard UM workflow.
I’ll have to change the module load intel-compiler/2021.5.0 statement in the {% macro UM_ENV() %} in /site/nci-gadi/suite-adds.rc to use the latest intel compiler (and is this recommended?)
There is this (empty) entry in app/fcm_make_um/rose-app.conf:
fcflags_overrides=
Thefcm-make2.cfg file which is auto generated by fcm_make_um contains the following build flags
I’ll now run with this executable and let you know how I go. This was using the default Intel compilers specified in ACCESS-rAM3 (intel-compiler/2021.5.01)
1 Like
clairecarouge
(Claire Carouge, ACCESS-NRI Land Modelling Team Lead)
3
@Paul.Gregory good work. For your information, I don’t think anyone has tried building and running the UM on the Sapphire Rapids nodes using the standard UM workflow within the suite. But it has been done using spack for ESM1.6 (old UM version) at least.
Have been talking with Ben Menadue and others at NCI about the best way to compile source for the sapphire rapids nodes, and how to build executables to run on both broadwell nodes (normal queue) and sapphire rapids nodes (normalsr queue). Their comments
The Intel compiler can generate runtime dispatch for different architectures, but it has a very slight performance impact. So I’d say it’s better than using the lowest-common denominator architecture but not quite as good as a separate build for each
-march=broadwell means “use the instruction sets that are in the Broadwell architecture”
-xBROADWELL means “optimised specifically for Intel Broadwell architecture”
-mauto-arch=a,b,c means “generate additional code using the instruction sets that are in these architectures”
-axA,B,C means “generate optimised code specifically for these Intel architectures”
For the best performance you would use -x.
I’m found that UM source built with-xSAPPHIRERAPIDS fails with a segmentation fault in the radiance calculation.
Ben’s suggestions for chasing down this error.
I’m guessing you’re seeing it when using -xSAPPHIRERAPIDS instead of -march because it’s optimised the code differently. The first thing I would try is to ensure your stack size is unlimited. You may need to wrap your binary in a script that sets that before invoking your binary as I’m not sure how reliable copying those settings from the first node of the job is (some MPI libraries do it, others don’t).
Otherwise, if you still see that, I’d suggest recompiling with the -check bounds (at least) flag to ensure there’s not an accidental off-by-one error or similar there.
Other comments about the default UM compile flags.
As an aside, the -i8 and -r8 flags change the ABI of the compiler such that the generated code will not be compatible with the standard Linux ABI anymore. This means you cannot use our MPI / NetCDF / etc libraries without having something in the middle that translates between them. For example, our MPI libraries expect 4-byte integers for most arguments, and the MPI_REAL datatype corresponds to a single-precision floating-point number. A much safer approach is to use the standard ABI and explicitly designate the size of variables where needed (i.e. if you need a 64-bit integer use INTEGER(kind=INT64) (the INT64 parameter is in the ISO_FORTRAN_ENV module).
Also, the “-mcmodel medium” flag changes the addressing model to never use RIP-relative addressing for objects. This is only needed if your program has more than 2GB of static objects (e.g. COMMON blocks, or module-level or SAVE variables), and results in larger and less efficient generated code. Are you sure you need this? Note that automatic or allocatable objects do not count towards this limit.
Just to chime in into the use of -r8 and -i8 those are in general “bad practices” that have kept along many Fortran codebases, they’ll be difficult to get rid off, sadly.
For -mcmodel=medium, it might be something that “someone added this flag 5 years ago” and no one has really bothered with checking if it is actually needed.
Sharing these with the community, in case anyone has some comments/experience.