When an attempt to parallel read a restart file is made, we get this error:
get_stripe failed: 61 (No data available)
Abort with message NetCDF: Error initializing for parallel access in file /jobfs/98914803.gadi-pbs/mo1833/spack-stage/spack-stage-parallelio-2.5.10-hyj75i7d5yy5zbqc7jm6whlkduofib2k/spack-src/src/clib/pioc_support.c at line 2832
This get_stripe failed is because payu provides a symlink to the restart file rather than the file. The payu ‘work’ directiory has an ‘input’ folder with symlinks ( e.g. input/iced.1900-01-01-10800.nc -> /g/data/ik11/inputs/access-om3/0.x.0/1deg/cice/iced.1900-01-01-10800.nc)
Trying this bash shows what is going on:
$ lfs getstripe input/iced.1900-01-01-10800.nc
input/iced.1900-01-01-10800.nc has no stripe info
I have always thought of symlinks as pretty robust!
Options here are possibly:
Update payu to point directly to the file instead of symlinking (would this need a copy of the restart files in the work directory?)
Raise the issue with the developers of the ParrallelIO library
Something else ?
I haven’t yet thought through how updating payu would work. We wouldn’t want to have to make a copy of every restart file, every time a model component run by payu is initialised.
I have focussed on testing this with CICE & NUOPC, but every model component run by Payu will have the same issue. To update to Netcdf4 and parallel reads for any other component (I tested with the data-atmosphere but all would be impacted) some change will need to be made.
1 Like
Aidan
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
2
I agree this is a decent place to talk about it, as it involves a lot more than just payu.
So why does PIO in CICE5 work with ACCESS-OM2 with symlinks? Is nuopc doing some other step that involves interrogating the striping?
This would constitute a pretty large change in the logic of payu and would be an option of last resort.
I’d plump for figuring out if you can turn off this ifs interrogation step. My recollection from Nic Hannah’s testing was that he didn’t get much IO improvement when he changed the default PIO configuration.
@rui.yang is the one who knows about optimising Lustre striping though.
The big change was probably from switching to Parallel IO, the configuration of it is probably less important as long as it is reasonable (i.e. 12 vs 24 threads for IO might not change much in the final result due to the limitations in disk access).
Similarly, I expect the default configuration for striping will be fine at our file sizes. It just needs to work.
Aidan
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
4
To address something you mentioned in your original post, IIRC there was no discernible difference in performance for 1 deg and even 0.25 deg wasn’t much. It was the tenth that it really made a difference, but @aekiss might recall, or have some actual numbers.
There is some useful stuff in the TWG minutes, search for PIO and start sifting for nuggets of wisdom:
This step is buried deep in in the open-mpi library:
Which makes it kind of challenging to do anything about. Fortran also doesn’t have a good way to just read the path it’s trying to point the symlink to.
We could:
Update the rpointer files in Payu to use full paths for the restart and input files (instead of using the symlinks).
Copy the restart files (as above) in Payu
Patch CICE to read using serial input and only write using parallel output. (IO is controlled in nuopc.run config, so it’s messy). This would mean all model components would have to use serial input and have a similar patch applied if we wanted parallel output (I don’t know if that would be slow, the datm files are a lot bigger than the CICE ones but still less than 2GB each).
Possibly raise an issue with open-mpi, and make the case the behaviour with Lustre is wrong and they should be checking for symlinks when opening files.
Aidan
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
8
I don’t think there are. I usually load a vanilla conda environment (one without payu):
module use /g/data/hh5/public/modules
module load conda/python3
and then
pip install -e . --user
in the payu source directory.
That installs into ~/.local/bin/payu.
Wow. Good find.
That would break the manifests as they’re currently implemented.
That’s a decent work-around for the time being. Does require a patched payu however.