Running ACCESS-OM2 on Leonardo supercomputer

Dear All,

First of all I’m very glad to find this forum. I’m new in the community
and just started to get into the ACCESS-OM2 configuration.

The task for the moment is to compile and run test experiment on
Leonardo supercomputer (in Italy), I currently work at OGS in Trieste, Italy.

So, the question is: is there any documentation how to run outside of Gadi?
I spent half of the day looking up through repositories :slight_smile: But couldn’t find anything.

Probably I just don’t know where to look… So I’d be very grateful for any advices.

Many thanks,
Natalia

1 Like

Hi Natalia,

It’s great to hear from researchers using a supercomputer other than Gadi. I have not created any documentation on compiling/running ACCESS-OM2 outside of Gadi. I will ask my colleagues if they know of any old documentation.

Recently we released a version of ACCESS-OM2 that was built using Spack (https://spack.io/). This should allow for ACCESS-OM2 to be more easily ported to another supercomputer. The instructions are: How to build ACCESS-OM2 on Gadi

Some modifications will need to be made to the Gadi specific files. e.g.

I’m happy to create instructions on how to compile on Leonardo, if you are able to test and provide feedback.

Can you please sent me the output from Leonardo of:

cat /etc/os-release

Regards,
Harshula

1 Like

Dear Harshula,

Thank you for the prompt feedback!
I’ll go through repos you provided.

Here is an output for Leonardo:

Thanks again,
Natalia

Hi Natalia,

Since Leonardo is using RHEL, please try the following instructions:

git clone -c feature.manyFiles=true https://github.com/spack/spack.git --branch releases/v0.20 --single-branch --depth=1
git clone https://github.com/ACCESS-NRI/spack-packages.git --branch main
git clone https://github.com/ACCESS-NRI/spack-config.git --branch main

ln -s -r -v spack-config/v0.20/ci/* spack/etc/spack/

. spack-config/spack-enable.bash

spack install intel-oneapi-compilers@2021.2.0 target=x86_64

spack load intel-oneapi-compilers@2021.2.0
spack compiler find

spack install access-om2 ^netcdf-c@4.7.4 ^netcdf-fortran@4.5.2 ^parallelio@2.5.2 ^openmpi@4.0.2 %intel@2021.2.0 target=x86_64

spack find

Also, it is better to attach textual output instead of screenshots.

Thanks,
Harshula

1 Like

Dear Harshula,

Thank you! I will try all these and will let you know here.

Also I’ll use text, thanks for the hint.

Natalia

1 Like

It looks like Leonardo already supports spack

https://www.hpc.cineca.it/systems/software/spack/

1 Like

Hi Natalia,

Have you had an opportunity to test the ACCESS-OM2 instructions? If you need more detailed information, I’m happy to join a video chat during EU morning hours.

Regards,
Harshula

Hi Harshula and Aidan,

I apologise for being silent for a while!

I’m currently attending a Master in High Performance yearly course and working with
ACCESS-OM2 is a part of my thesis project. The workload with classes and homework was huge until now so I had to stay with them.

Thank you very much for being responsive and your readiness to help!
I will keep you updated on the progress of compiling on Leonardo.

Thanks,
Natalia

1 Like

At the moment spack gives the following output errors when installing the list of packages, it fails with openmpi:

Summary

==> Installing openmpi-4.0.2-luyhnohjapwuh4ejdruohnz2bk6mwmc7
==> No binary for openmpi-4.0.2-luyhnohjapwuh4ejdruohnz2bk6mwmc7 found: installing from source
==> Using cached archive: /leonardo_scratch/large/userexternal/ntilinin/spack/var/spack/cache/_source-cache/archive/90/900bf751be72eccf06de9d186f7b1c4b5c2fa9fa66458e53b77778dffdfe4057.tar.bz2
==> Applied patch /leonardo_scratch/large/userexternal/ntilinin/spack/var/spack/repos/builtin/packages/openmpi/fix-ucx-1.7.0-api-instability.patch
==> Applied patch /leonardo_scratch/large/userexternal/ntilinin/spack/var/spack/repos/builtin/packages/openmpi/opal_assembly_arch.patch
==> openmpi: Executing phase: ‘autoreconf’
==> openmpi: Executing phase: ‘configure’
==> openmpi: Executing phase: ‘build’
==> Error: ProcessError: Command exited with status 2:
‘make’ ‘-j16’ ‘V=1’

29 errors found in build log:
13831 /bin/sh …/…/…/…/libtool --tag=CC --mode=compile /leonardo_scratch/large/userexternal/ntilinin/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H
-I. -I…/…/…/…/opal/include -I…/…/…/…/ompi/include -I…/…/…/…/oshmem/include -I…/…/…/…/opal/mca/hwloc/hwloc201/hwloc/include/private/
autogen -I…/…/…/…/opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen -I…/…/…/…/ompi/mpiext/cuda/c -I…/…/…/… -I…/…/…/…/orte/inclu
de -I/leonardo_scratch/large/userexternal/ntilinin/release/linux-rhel8-x86_64/intel-2021.2.0/zlib-1.2.13-4h4sif2qagfapwdnz2pn36n5whjn2vdp/include
-I/leonardo_scratch/large/userexternal/ntilinin/release/linux-rhel8-x86_64/intel-2021.2.0/hwloc-2.9.1-baak3vruvdy44nmzhl65g5ison3qmd24/include
-I/leonardo_scratch/large/userexternal/ntilinin/release/linux-rhel8-x86_64/intel-2021.2.0/libevent-2.1.12-rgrznhf3oxxhgo5thjeidwlrdm2ltwds/include
-O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,–extended_float_types -pthread -MT fs_lustre_file_open.lo -MD -MP -MF
$depbase.Tpo -c -o fs_lustre_file_open.lo fs_lustre_file_open.c &&
13832 mv -f $depbase.Tpo $depbase.Plo
13833 libtool: compile: /leonardo_scratch/large/userexternal/ntilinin/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I…/…/…/…/opal/include -I…/
…/…/…/ompi/include -I…/…/…/…/oshmem/include -I…/…/…/…/opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I…/…/…/…/opal/mca/hwloc
/hwloc201/hwloc/include/hwloc/autogen -I…/…/…/…/ompi/mpiext/cuda/c -I…/…/…/… -I…/…/…/…/orte/include -I/leonardo_scratch/large/userexter
nal/ntilinin/release/linux-rhel8-x86_64/intel-2021.2.0/zlib-1.2.13-4h4sif2qagfapwdnz2pn36n5whjn2vdp/include -I/leonardo_scratch/large/userexternal/
ntilinin/release/linux-rhel8-x86_64/intel-2021.2.0/hwloc-2.9.1-baak3vruvdy44nmzhl65g5ison3qmd24/include -I/leonardo_scratch/large/userexternal/ntil
inin/release/linux-rhel8-x86_64/intel-2021.2.0/libevent-2.1.12-rgrznhf3oxxhgo5thjeidwlrdm2ltwds/include -O3 -DNDEBUG -finline-functions -fno-strict
-aliasing -restrict -Qoption,cpp,–extended_float_types -pthread -MT fs_lustre.lo -MD -MP -MF .deps/fs_lustre.Tpo -c fs_lustre.c -fPIC -DPIC -o .l
ibs/fs_lustre.o
13834 libtool: compile: /leonardo_scratch/large/userexternal/ntilinin/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I…/…/…/…/opal/include -I…/
…/…/…/ompi/include -I…/…/…/…/oshmem/include -I…/…/…/…/opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I…/…/…/…/opal/mca/hwloc
/hwloc201/hwloc/include/hwloc/autogen -I…/…/…/…/ompi/mpiext/cuda/c -I…/…/…/… -I…/…/…/…/orte/include -I/leonardo_scratch/large/userexter
nal/ntilinin/release/linux-rhel8-x86_64/intel-2021.2.0/zlib-1.2.13-4h4sif2qagfapwdnz2pn36n5whjn2vdp/include -I/leonardo_scratch/large/userexternal/
ntilinin/release/linux-rhel8-x86_64/intel-2021.2.0/hwloc-2.9.1-baak3vruvdy44nmzhl65g5ison3qmd24/include -I/leonardo_scratch/large/userexternal/ntil
inin/release/linux-rhel8-x86_64/intel-2021.2.0/libevent-2.1.12-rgrznhf3oxxhgo5thjeidwlrdm2ltwds/include -O3 -DNDEBUG -finline-functions -fno-strict
-aliasing -restrict -Qoption,cpp,–extended_float_types -pthread -MT fs_lustre_file_open.lo -MD -MP -MF .deps/fs_lustre_file_open.Tpo -c fs_lustre_
file_open.c -fPIC -DPIC -o .libs/fs_lustre_file_open.o
13835 libtool: compile: /leonardo_scratch/large/userexternal/ntilinin/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I…/…/…/…/opal/include -I…/
…/…/…/ompi/include -I…/…/…/…/oshmem/include -I…/…/…/…/opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I…/…/…/…/opal/mca/hwloc
/hwloc201/hwloc/include/hwloc/autogen -I…/…/…/…/ompi/mpiext/cuda/c -I…/…/…/… -I…/…/…/…/orte/include -I/leonardo_scratch/large/userexter
nal/ntilinin/release/linux-rhel8-x86_64/intel-2021.2.0/zlib-1.2.13-4h4sif2qagfapwdnz2pn36n5whjn2vdp/include -I/leonardo_scratch/large/userexternal/
ntilinin/release/linux-rhel8-x86_64/intel-2021.2.0/hwloc-2.9.1-baak3vruvdy44nmzhl65g5ison3qmd24/include -I/leonardo_scratch/large/userexternal/ntil
inin/release/linux-rhel8-x86_64/intel-2021.2.0/libevent-2.1.12-rgrznhf3oxxhgo5thjeidwlrdm2ltwds/include -O3 -DNDEBUG -finline-functions -fno-strict
-aliasing -restrict -Qoption,cpp,–extended_float_types -pthread -MT fs_lustre_component.lo -MD -MP -MF .deps/fs_lustre_component.Tpo -c fs_lustre_
component.c -fPIC -DPIC -o .libs/fs_lustre_component.o
13836 In file included from fs_lustre.c(42):

13837 /usr/include/sys/mount.h(35): error: expected an identifier
13838 MS_RDONLY = 1, /* Mount read-only. /
13839 ^
13840
13841 In file included from fs_lustre.c(42):
13842 /usr/include/sys/mount.h(37): error: expected an identifier
13843 MS_NOSUID = 2, /
Ignore suid and sgid bits. /
13844 ^
13845
13846 In file included from fs_lustre.c(42):
13847 /usr/include/sys/mount.h(39): error: expected an identifier
13848 MS_NODEV = 4, /
Disallow access to device special files. /
13849 ^
13850
13851 In file included from fs_lustre.c(42):
13852 /usr/include/sys/mount.h(41): error: expected an identifier
13853 MS_NOEXEC = 8, /
Disallow program execution. /
13854 ^
13855
13856 In file included from fs_lustre.c(42):
13857 /usr/include/sys/mount.h(43): error: expected an identifier
13858 MS_SYNCHRONOUS = 16, /
Writes are synced at once. /
13859 ^
13860
13861 In file included from fs_lustre.c(42):
13862 /usr/include/sys/mount.h(45): error: expected an identifier
13863 MS_REMOUNT = 32, /
Alter flags of a mounted FS. /
13864 ^
13865
13866 In file included from fs_lustre.c(42):
13867 /usr/include/sys/mount.h(47): error: expected an identifier
13868 MS_MANDLOCK = 64, /
Allow mandatory locks on an FS. /
13869 ^
13870
13871 In file included from fs_lustre.c(42):
13872 /usr/include/sys/mount.h(49): error: expected an identifier
13873 MS_DIRSYNC = 128, /
Directory modifications are synchronous. /
13874 ^
13875
13876 In file included from fs_lustre.c(42):
13877 /usr/include/sys/mount.h(51): error: expected an identifier
13878 MS_NOATIME = 1024, /
Do not update access times. /
13879 ^
13880
13881 In file included from fs_lustre.c(42):
13882 /usr/include/sys/mount.h(53): error: expected an identifier
13883 MS_NODIRATIME = 2048, /
Do not update directory access times. /
13884 ^
13885
13886 In file included from fs_lustre.c(42):
13887 /usr/include/sys/mount.h(55): error: expected an identifier
13888 MS_BIND = 4096, /
Bind directory at different place. /
13889 ^
13890
13891 In file included from fs_lustre.c(42):
13892 /usr/include/sys/mount.h(57): error: expected an identifier
13893 MS_MOVE = 8192,
13894 ^
13895
13896 In file included from fs_lustre.c(42):
13897 /usr/include/sys/mount.h(59): error: expected an identifier
13898 MS_REC = 16384,
13899 ^
13900
13901 In file included from fs_lustre.c(42):
13902 /usr/include/sys/mount.h(61): error: expected an identifier
13903 MS_SILENT = 32768,
13904 ^
13905
13906 In file included from fs_lustre.c(42):
13907 /usr/include/sys/mount.h(63): error: expected an identifier
13908 MS_POSIXACL = 1 << 16, /
VFS does not apply the umask. /
13909 ^
13910
13911 In file included from fs_lustre.c(42):
13912 /usr/include/sys/mount.h(65): error: expected an identifier
13913 MS_UNBINDABLE = 1 << 17, /
Change to unbindable. /
13914 ^
13915
13916 In file included from fs_lustre.c(42):
13917 /usr/include/sys/mount.h(67): error: expected an identifier
13918 MS_PRIVATE = 1 << 18, /
Change to private. /
13919 ^
13920
13921 In file included from fs_lustre.c(42):
13922 /usr/include/sys/mount.h(69): error: expected an identifier
13923 MS_SLAVE = 1 << 19, /
Change to slave. /
13924 ^
13925
13926 In file included from fs_lustre.c(42):
13927 /usr/include/sys/mount.h(71): error: expected an identifier
13928 MS_SHARED = 1 << 20, /
Change to shared. /
13929 ^
13930
13931 In file included from fs_lustre.c(42):
13932 /usr/include/sys/mount.h(73): error: expected an identifier
13933 MS_RELATIME = 1 << 21, /
Update atime relative to mtime/ctime. /
13934 ^
13935
13936 In file included from fs_lustre.c(42):
13937 /usr/include/sys/mount.h(75): error: expected an identifier
13938 MS_KERNMOUNT = 1 << 22, /
This is a kern_mount call. /
13939 ^
13940
13941 In file included from fs_lustre.c(42):
13942 /usr/include/sys/mount.h(77): error: expected an identifier
13943 MS_I_VERSION = 1 << 23, /
Update inode I_version field. /
13944 ^
13945
13946 In file included from fs_lustre.c(42):
13947 /usr/include/sys/mount.h(79): error: expected an identifier
13948 MS_STRICTATIME = 1 << 24, /
Always perform atime updates. /
13949 ^
13950
13951 In file included from fs_lustre.c(42):
13952 /usr/include/sys/mount.h(81): error: expected an identifier
13953 MS_LAZYTIME = 1 << 25, /
Update the on-disk [acm]times lazily. */
13954 ^
13955
13956 In file included from fs_lustre.c(42):
13957 /usr/include/sys/mount.h(83): error: expected an identifier
13958 MS_ACTIVE = 1 << 30,
13959 ^
13960
13961 In file included from fs_lustre.c(42):
13962 /usr/include/sys/mount.h(85): error: expected an identifier
13963 MS_NOUSER = 1 << 31
13964 ^
13965
13966 fs_lustre.c(95): warning #2330: argument of type “const char *” is incompatible with parameter of type “char *” (dropping qualifiers)
13967 fh->f_fstype = mca_fs_base_get_fstype ( fh->f_filename );
13968 ^
13969
13970 compilation aborted for fs_lustre.c (code 2)
13971 make[2]: *** [Makefile:1866: fs_lustre.lo] Error 1
13972 make[2]: *** Waiting for unfinished jobs…
13973 libtool: compile: /leonardo_scratch/large/userexternal/ntilinin/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I…/…/…/…/opal/include -I…/
…/…/…/ompi/include -I…/…/…/…/oshmem/include -I…/…/…/…/opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I…/…/…/…/opal/mca/hwloc
/hwloc201/hwloc/include/hwloc/autogen -I…/…/…/…/ompi/mpiext/cuda/c -I…/…/…/… -I…/…/…/…/orte/include -I/leonardo_scratch/large/userexter
nal/ntilinin/release/linux-rhel8-x86_64/intel-2021.2.0/zlib-1.2.13-4h4sif2qagfapwdnz2pn36n5whjn2vdp/include -I/leonardo_scratch/large/userexternal/
ntilinin/release/linux-rhel8-x86_64/intel-2021.2.0/hwloc-2.9.1-baak3vruvdy44nmzhl65g5ison3qmd24/include -I/leonardo_scratch/large/userexternal/ntil
inin/release/linux-rhel8-x86_64/intel-2021.2.0/libevent-2.1.12-rgrznhf3oxxhgo5thjeidwlrdm2ltwds/include -O3 -DNDEBUG -finline-functions -fno-strict
-aliasing -restrict -Qoption,cpp,–extended_float_types -pthread -MT fs_lustre_component.lo -MD -MP -MF .deps/fs_lustre_component.Tpo -c fs_lustre_
component.c -o fs_lustre_component.o >/dev/null 2>&1
13974 libtool: compile: /leonardo_scratch/large/userexternal/ntilinin/spack/lib/spack/env/intel/icc -DHAVE_CONFIG_H -I. -I…/…/…/…/opal/include -I…/
…/…/…/ompi/include -I…/…/…/…/oshmem/include -I…/…/…/…/opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I…/…/…/…/opal/mca/hwloc
/hwloc201/hwloc/include/hwloc/autogen -I…/…/…/…/ompi/mpiext/cuda/c -I…/…/…/… -I…/…/…/…/orte/include -I/leonardo_scratch/large/userexter
nal/ntilinin/release/linux-rhel8-x86_64/intel-2021.2.0/zlib-1.2.13-4h4sif2qagfapwdnz2pn36n5whjn2vdp/include -I/leonardo_scratch/large/userexternal/
ntilinin/release/linux-rhel8-x86_64/intel-2021.2.0/hwloc-2.9.1-baak3vruvdy44nmzhl65g5ison3qmd24/include -I/leonardo_scratch/large/userexternal/ntil
inin/release/linux-rhel8-x86_64/intel-2021.2.0/libevent-2.1.12-rgrznhf3oxxhgo5thjeidwlrdm2ltwds/include -O3 -DNDEBUG -finline-functions -fno-strict
-aliasing -restrict -Qoption,cpp,–extended_float_types -pthread -MT fs_lustre_file_open.lo -MD -MP -MF .deps/fs_lustre_file_open.Tpo -c fs_lustre_
file_open.c -o fs_lustre_file_open.o >/dev/null 2>&1
13975 make[2]: Leaving directory ‘/scratch_local/ntilinin/spack-stage/spack-stage-openmpi-4.0.2-luyhnohjapwuh4ejdruohnz2bk6mwmc7/spack-src/ompi/mca/fs/lu
stre’
13976 make[1]: *** [Makefile:3532: all-recursive] Error 1
13977 make[1]: Leaving directory ‘/scratch_local/ntilinin/spack-stage/spack-stage-openmpi-4.0.2-luyhnohjapwuh4ejdruohnz2bk6mwmc7/spack-src/ompi’
13978 make: *** [Makefile:1879: all-recursive] Error 1

See build log for details:
/scratch_local/ntilinin/spack-stage/spack-stage-openmpi-4.0.2-luyhnohjapwuh4ejdruohnz2bk6mwmc7/spack-build-out.txt

==> Warning: Skipping build of mom5-master-zoyam3ku2fv3zjhlhtz2oomcv5hmk3oc since openmpi-4.0.2-luyhnohjapwuh4ejdruohnz2bk6mwmc7 failed
==> Warning: Skipping build of access-om2-latest-prktj7fy2cfv45xtqufrg64ph73a5vly since mom5-master-zoyam3ku2fv3zjhlhtz2oomcv5hmk3oc failed
==> Warning: Skipping build of cice5-master-vvvil77faon7x5ri4i6yvh7smwulzgi4 since openmpi-4.0.2-luyhnohjapwuh4ejdruohnz2bk6mwmc7 failed
==> Warning: Skipping build of parallelio-2.5.2-moskoupqn3l7pe3fagiifadmqbfw6qi6 since openmpi-4.0.2-luyhnohjapwuh4ejdruohnz2bk6mwmc7 failed
==> Warning: Skipping build of oasis3-mct-master-yp6eluuircccqfst5zdtyhqrgucsijkx since openmpi-4.0.2-luyhnohjapwuh4ejdruohnz2bk6mwmc7 failed
==> Warning: Skipping build of libaccessom2-master-r57bpljtuxmzya3xykwsq2nnuvxgbpbh since oasis3-mct-master-yp6eluuircccqfst5zdtyhqrgucsijkx failed
==> Warning: Skipping build of hdf5-1.14.1-2-3msvuphu6glhjg437dp5mjtqallwhbbe since openmpi-4.0.2-luyhnohjapwuh4ejdruohnz2bk6mwmc7 failed
==> Warning: Skipping build of netcdf-c-4.7.4-tfq3y6txvenhuiktt52sar736m3t7w45 since hdf5-1.14.1-2-3msvuphu6glhjg437dp5mjtqallwhbbe failed
==> Warning: Skipping build of netcdf-fortran-4.5.2-5rlirjaepmq7aa5xklngh3vuhwjicncl since netcdf-c-4.7.4-tfq3y6txvenhuiktt52sar736m3t7w45 failed
==> Error: access-om2-latest-prktj7fy2cfv45xtqufrg64ph73a5vly: Package was not installed
==> Error: Installation request failed. Refer to reported errors for failing package(s).

But am I getting right that I should basically have loaded or installed locally for my user profile openmpi, intel compilers with netcdf and parallelio?

Thanks,
Natalia

Hi Natalia, I need more context. Perhaps we can schedule a live debug session during an EU morning?

1 Like

That would be perfect! I’ll sent my contacts in the message now.

Thanks
Natalia

Summary: Openmpi 4.0.2 failed to build via Spack. Logs will be emailed. ACCESS-OM2 was built successfully with openmpi 4.1.4 via Spack.

2 Likes

Dockerfile: build-ci/containers/Dockerfile.base-spack at main · ACCESS-NRI/build-ci · GitHub

docker build -f Dockerfile.base-spack -t <name>:<version> --target dev . 
1 Like

Hi @Aidan, hi @harshula!
I’m in the process of setting up demo experiment on Leo (thanks again @harshula for help with building ACCESS OM2 with spack). It seems that in any case I would need to download forcing fields locally. Would that be possible to get 1deg_jra55_ryf fields? It seems for me that 1 deg repeated year forcing fields should be relatively small (first hundreds MBs).

Thank you!
Natalia

I can see get_input_data.py script in install.sh that supposed to get the data probably, but the *.py file itself is not in the repo

I also have just realised that I will have to convert PBS scheduling to SLURM (we use on Leonardo) :sweat_smile:

Attention
PBS is no more available on HPC clusters in Cineca, since Jan 2018

The only thing that I found in COSIMA discussion:

AK: For IAF had a lot of daily CICE output. Not complete set of fields.

MW: Starting to run performance tests at GFDL and want to use payu. Has it changed much? Manifest stuff hasn’t made a big difference? Will have to get slurm working. Filesystem will be a nightmare. You moved PBS stuff into a component? AH: No, you did that. Not huge differences. Will be great to have slurm support.

@Aidan, is there any progress towards slurm?

Would be grateful for any updates!

Many thanks,
Natalia

The answer is yes, I think so, but it is just a matter of time to find a suitable location and upload them.

Do you have a time-frame you need them by?

There are some people who use payu with slurm, specifically @ChrisC28 and @john_reilly.

It is talked about in this issue

and this is an old PR I made with some changes to get payu working on the Pawsey HPC (which uses slurm)

Maybe @ChrisC28 could comment about how easy (or not) this is, and if there is a better version of payu to use.