OK I found the executable compiled with
fcflags_overrides=-march=sapphirerapids
provided useful speedup on the normalsr queue. This executable was compiled with
module load intel-compiler-llvm/2025.0.4
Better results were generated by adding
fcflags_overrides=-march=sapphirerapids -qopt-zmm-usage=high