MPI profilers on NCI

Hi all.

I’d like to use some profilers on NCI, and in particular measure write speed for specific UM output files.

Has anyone here used any profilers for the UM?

The NCI Opus pages suggest Linaro DDT can be used to run a UM task.

NCI have also suggested Darsha, although they note there is some incompatibility with that and Open MPI, so you can’t use your own romio files.

Just asking if there is any experience using these before I dive in.

ping @manodeep

1 Like

Hi Paul,

NCI has 256 core licenses for linaro-forge ddt and map tools - would that be sufficient for your needs? (Performance-report goes up to 2048 cores but only provides aggregate data, and not source-level info.)

I have used linaro-forge map for profiling ESM1.6 - fairly straightforward to setup through payu. Happy to assist if you would like.

Cheers!
Manodeep

Hi @manodeep

Thanks for the quick reply. I have run Linaro-forget ddt on an ACCESS-OM3 executable w/debug symbols (although that didn’t work).

I’d like to run the map on an ACCESS-AM3 suite, using a spack built executable. Will this run ‘out of the box’ or does it need recompiling/re-linking?

At this stage I only want to track I/O performance.

If you want to inspect the associated source, then you will (likely) need to build with debug symbols. Other than that, map should run out of the box.

If you are running through payu, you will need to add these lines in the config.yaml

  1. Load the linaro-forge map exe
  - linaro-forge/24.0.2  (under the module load section)
  1. prepend the map exe in front of the actual mpirun command,
mpi:
  runcmd: map --profile --report=txt,html,summary mpirun

Once your run finishes, you can examine the generated profile (file extension .map) with the same map command, but without the --profile flag - i.e., map <path/to/mapfilename.map>. The mapfilename itself contains the executable name, the number of PBS nodes, OMP threads and the date-time string - so should be unique everytime you run the profiler.

Hope this helps!

Cheers,
Manodeep

1 Like

Thanks.

The UM Atmospheric task is running in rose-cylc, so I’ll follow the guidelines suggested here:

If that doesn’t work, I’ll just build my own job submission script.

I’ll keep you posted.

1 Like

On using Linaro DDT on the ACCESS models- I’m currently working on docs explaining how to do this, but the abridged version that worked for me is:

  • Set up your own spack instance as per the docs
  • Clone the desired model repository, and add fflags='-O0 -g -traceback cflags='-O0 -g -fno-omit-frame-pointer (I know those fflags work, haven’t actually tried the cflags yet so @Manodeep may correct me?) to the model specs.
  • Call spack concretize -f then spack install --keep-stage- the keep-stage is necessary to prevent Spack from cleaning up the source code used to compile the executable.

At this point, you should be able to use the NCI instructions for using Linaro DDT.

2 Likes

@lachlanswhyborn I would recommend using the flags that you are planning to use for production runs - otherwise, the generated instructions can be dramatically different and the insights from profiling (with an un-optimised build) may not be applicable to your production exe.

The instructions look spot-on - you certainly would want to use --keep-stage to keep the pre-processed source files and point the profiler to the relevant source directory. It is good to add the -fno-omit-frame-pointer - I have not noticed any performance hits but seems to improve symbol resolution and pinpointing runtime exes into the source line of code.

This is for debugging rather than profiling- for profiling, yes I’d certainly use the same flags as used in production. I tried debugging with optimisation flags on and the debugger often seemed to get confused as to where the program was up to, relative to the source code.

Just an update. I have the DDT debugger and connected to a UM task. In this case, an ACCESS-AM3 suite.

At NCI’s suggestion, I have downloaded a Linaro Forge client which I run from my local laptop (Mac in my case).

When you launch the Linaro client, activate the “Remote Launch” Configure pull down menu and create a session for gadi. I specified the following:

connection name : gadi
Host Name : <user-id>@gadi.nci.org.au

Remote Installation Directory : /apps/linaro-forge/24.0.2/

You can then click ‘Test Remove Launch’ to check this works correctly.

Then activate your ‘gadi’ Remote Launch, and you will have a pop-up menu which states something like
”A new Reverse Connect request is available from gadi-hmem-clx-XXXX.gadi.nci.org.au for Linaro DDT.”

Click accept, and then run.

Now you’re debugging the UM inside your rose-cycle suite!

1 Like

@manodeep - following on from our discussion at RSE meeting on Oct 24, the default compile flags used within the ACCESS-AM3 suite are provided in the file

/scratch/$PROJECT/$USER/UM/fcm-make/nci-x86-ifort/um-atmos-safe.cfg

include-path = $HERE/inc $HERE/../inc

$extract{?} = extract

$fcflags_level = -O2 -fp-model precise

include = um-atmos-common.cfg

Which sets the fcflags_level environment variable the in app/fcm_make_um/rose-suite.conf namelist.

The full compilation flags from fcm-make-on-success.cfg are

build.prop{class, fc.flags} = -i8 -r8 -mcmodel=medium -std08 -g -traceback  -assume nosource_include -O2 -fp-model precise -qopenmp   

When building an executable for debugging, I use the `debug’ config file:

/scratch/$PROJECT/$USER/UM/fcm-make/nci-x86-ifort/um-atmos-debug.cfg

include-path = $HERE/inc $HERE/../inc

$extract{?} = extract

$fcflags_level = -O0 -fp-model precise -traceback -fpe0

include = um-atmos-common.cfg

The full compilation flags then become:

build.prop{class, fc.flags} = -i8 -r8 -mcmodel=medium -std08 -g -traceback -assume nosource_include -O0 -fp-model precise -traceback -qopenmp   

What were the other flags you suggested?

@Paul.Gregory Are you using the oneAPI or classic Intel compiler?

For the debugging setup, I would also add -fno-omit-frame-pointer.