Transient payu errors

Background

There have recently been some reports of transient errors with payu:

ImportError: /g/data/vk83/prerelease/apps/base_conda/envs/payu-dev-20250501T210937Z-028805b/lib/python3.10/lib-dynload/_socket.cpython-310-x86_64-linux-gnu.so: cannot read file data: Input/output error
Detailed stack trace
$ payu run -f
Traceback (most recent call last):
  File "/g/data/vk83/prerelease/apps/base_conda/envs/payu-dev-20250501T210937Z-028805b/bin/payu", line 5, in <module>
    from payu.cli import parse
  File "/g/data/vk83/prerelease/apps/base_conda/envs/payu-dev-20250501T210937Z-028805b/lib/python3.10/site-packages/payu/cli.py", line 23, in <module>
    from payu.models import index as supported_models
  File "/g/data/vk83/prerelease/apps/base_conda/envs/payu-dev-20250501T210937Z-028805b/lib/python3.10/site-packages/payu/models/__init__.py", line 4, in <module>
    from payu.models.cesm_cmeps import AccessOm3
  File "/g/data/vk83/prerelease/apps/base_conda/envs/payu-dev-20250501T210937Z-028805b/lib/python3.10/site-packages/payu/models/cesm_cmeps.py", line 21, in <module>
    from payu.models.fms import fms_collate
  File "/g/data/vk83/prerelease/apps/base_conda/envs/payu-dev-20250501T210937Z-028805b/lib/python3.10/site-packages/payu/models/fms.py", line 10, in <module>
    import multiprocessing
  File "/g/data/vk83/prerelease/apps/base_conda/envs/payu-dev-20250501T210937Z-028805b/lib/python3.10/multiprocessing/__init__.py", line 16, in <module>
    from . import context
  File "/g/data/vk83/prerelease/apps/base_conda/envs/payu-dev-20250501T210937Z-028805b/lib/python3.10/multiprocessing/context.py", line 6, in <module>
    from . import reduction
  File "/g/data/vk83/prerelease/apps/base_conda/envs/payu-dev-20250501T210937Z-028805b/lib/python3.10/multiprocessing/reduction.py", line 16, in <module>
    import socket
  File "/g/data/vk83/prerelease/apps/base_conda/envs/payu-dev-20250501T210937Z-028805b/lib/python3.10/socket.py", line 51, in <module>
    import _socket
ImportError: /g/data/vk83/prerelease/apps/base_conda/envs/payu-dev-20250501T210937Z-028805b/lib/python3.10/lib-dynload/_socket.cpython-310-x86_64-linux-gnu.so: cannot read file data: Input/output error

The errors are transient, and so not reproducible. For this reason we’re assuming it is not an error with payu, but an issue with gadi, potentially filesystem related.

Solution

Luckily there is simple solution: run your experiment again.

With payu this is accomplished by removing the existing work directory (sweep) and then running the experiment again:

payu sweep
payu run

or

payu run -f

as the -f option does the sweep for you.

If that solution doesn’t work for you please do follow the guidelines to request help

Reporting

If you have this problem please reply to this topic and let us know. If there is a pattern to this problem it could help NCI to track down the source of this problem.

Note: there were some other transient errors when module loading payu:

FATAL:   container creation failed: mount /proc/self/fd/10->/opt/nci/singularity/3.11.3/var/singularity/mnt/session/overlay-images/0 error: while mounting image /proc/self/fd/10: failed to find loop device: could not attach image file to loop device: failed to attach loop device: transient error, please retry: resource temporarily unavailable

and

FATAL:   container creation failed: mount /proc/self/fd/3->/opt/nci/singularity/3.11.3/var/singularity/mnt/session/rootfs error: while mounting image /proc/self/fd/3: failed to mount squashfs filesystem: file exists

If you have this error please do report below. There has been a fix added to the payu container environment in the payu/dev prerelease module, but it isn’t available in the release payu environments until version 1.1.7 is released.

The new release of payu/1.3.0 uses NCI’s new installation of an Apptainer-based container engine on Gadi. This uses a different driver for mounting container images from the singularity-based installation. This should be more reliable, however, if any errors similar to the above occur, please reply to this topic so we can let NCI know.

Since 21 April 2026, I’ve been encountering a module conflict error while logging into Gadi, which is preventing me from running any payu-dependent jobs.

Error message

Loading payu/1.3.0
Loading requirement: apptainer

Loading singularity
ERROR: singularity cannot be loaded due to a conflict.
HINT: Might try "module unload apptainer" first.

Loading conda/analysis3-26.03
ERROR: Load of requirement singularity failed

Current module setup

In my .bash_profile, I am loading the following modules:

module use /g/data/access/projects/access/modules
module use /g/data/access/modules
module use /g/data/vk83/modules
module use /g/data/xp65/public/modules
#module load singularity
module load payu
module load intel-mkl/2019.3.199
module load conda/analysis3

Do I need to update or modify the modules in my .bash_profile to resolve this conflict? Thanks.

Hi @abhik, thanks for raising this issue. So it looks like only payu or conda/analysis3 can be loaded at one time due to the apptainer/singularity module conflict. I’ll have a look into changing the payu module to avoid loading the apptainer module all together to try avoid this conflict.

Hi @abhik,
while we’re working to fix this, I would like to point out that is not recommended to load multiple python-based modules at the same time (e.g., payu and conda/analysis3 modules) for several reasons.
Some of them include:

  • You will not be able to use both the python environments at the same time. Each environment has its own python interpreter and is “separate” from the other.
  • Although you will be able to use all executables that the multiple environments expose, this might incur in confusion with PATH resolution, and therefore some things might break, or silently not work as expected!

To avoid having to remember the module use ... and module load ... commands each time, I would use one of the following options:

  • you can keep the module use ... in your bash_profile (as that doesn’t necessarily create issues) and then only run the module load ... when a specific module is needed
  • for each python-related module, you can create a function in your bash_profile that runs both module use ... and module load ... commands. You can name the functions as you prefer, so you remember them. For example:
load_payu() {
    module use /g/data/vk83/modules
    module load payu
}
load_analysis() {
    module use /g/data/xp65/public/modules
    module load conda/analysis3
}

Then you can run

load_payu

or

load_analysis

depending on your use case.

Thanks @atteggiani and @jo-basevi.

I need access to conda/analysis3 to run several Python scripts. I previously tried commenting out the module load to avoid conflicts, but that led to missing Python dependencies, causing parts of the workflow to fail.

To avoid any conflict, I load the generic payu module (without specifying a version). This allows me to run the ACCESS and GFDL suites from my account, while still retaining access to the required Python environment through conda/analysis3.

For my purpose, both payu and conda/analysis3 are necessary for my setup. I’ll wait for any update on payu module.

Can you explain why it is necessary to have both loaded at the time same time? It might be there is a way to isolate the module load from each other in your workflow.

Earlier, I tried commenting on either of them in .bash_profile. The payu jobs like ACCESS ESM1.5 suite run do not work without loading payu, while all the required Python libraries are not loaded without conda/analysis3. Please let me know how I can run payu jobs and Python without any module conflicts.

Essentially don’t autoload the modules and use functions or aliases to load the modules when you need them. @atteggiani gave some examples above, but you can also set aliases in your .bashrc

alias loadpayu='module load payu'
alias loadconda='module load conda/analysis3'

When you need to use a module type the alias, e.g. loadconda. It isn’t saving many keystrokes, but you can make the alias whatever you want, so if you’d like a very terse command you can.

Hi @Aidan and @atteggiani, your advice worked well; I only had to load an additional module for payu: alias loadpayu='module load apptainer ; module load payu'. Thanks for your help.

Glad you found a working solution @abhik.

Just a note: you shouldn’t really need to load apptainer to be using payu environment, it takes care of it automatically. You can simply module load payu.

@atteggiani payu doesn’t load without apptainer module, and I get the following message:

@gadi-login-09 ~]$ module load payu
Loading payu/1.3.0
Loading requirement: apptainer

The:

Loading requirement: apptainer

doesn’t mean that payu would not load without it.

It means that by loading the payu module, you are also automatically loading apptainer.

The payu/1.3.0 module now no longer loads the apptainer module. I would still recommend following the advice above and only loading the payu module when you need to run payu jobs.