I’ve noticed that the netcdf_conversion job fails regularly (every few model years) in CM2. I believe, I’ve started to notice this since moving to the xp65 environment. I’m running a copy of suite u-db130/meltwater.
I’m wondering if others experience something similar?
This is the error message I get:
Using the cylc session localhost
Loading cylc7/23.09
Loading requirement: mosrs-setup/1.0.1
Loading pythonlib/um2netcdf4/xp65
Loading requirement: singularity conda/analysis3-25.08
Traceback (most recent call last):
File "/home/581/wgh581/cylc-run/u-ds210_ensemble3/share/fcm_make_drivers/build-drivers/bin/run_um2netcdf.py", line 7, in <module>
import os, datetime, collections, um2netcdf4, shutil, re, f90nml
File "/g/data/access/projects/access/apps/pythonlib/um2netcdf4/2.1/um2netcdf4.py", line 8, in <module>
from iris.fileformats.pp import PPField
File "/g/data/xp65/public/apps/med_conda/envs/analysis3-25.08/lib/python3.11/site-packages/iris/fileformats/__init__.py", line 17, in <module>
from . import name, netcdf, nimrod, pp, um
File "/g/data/xp65/public/apps/med_conda/envs/analysis3-25.08/lib/python3.11/site-packages/iris/fileformats/netcdf/__init__.py", line 22, in <module>
from .._nc_load_rules.helpers import UnknownCellMethodWarning, parse_cell_methods
File "/g/data/xp65/public/apps/med_conda/envs/analysis3-25.08/lib/python3.11/site-packages/iris/fileformats/_nc_load_rules/helpers.py", line 28, in <module>
import pyproj
File "/g/data/xp65/public/apps/med_conda/envs/analysis3-25.08/lib/python3.11/site-packages/pyproj/__init__.py", line 33, in <module>
import pyproj.network
File "/g/data/xp65/public/apps/med_conda/envs/analysis3-25.08/lib/python3.11/site-packages/pyproj/network.py", line 10, in <module>
from pyproj._network import ( # noqa: F401 pylint: disable=unused-import
File "pyproj/_network.pyx", line 1, in init pyproj._network
ImportError: /g/data/xp65/public/apps/med_conda/envs/analysis3-25.08/lib/python3.11/site-packages/pyproj/_datadir.cpython-311-x86_64-linux-gnu.so: cannot read file data: Input/output error
[FAIL] run_um2netcdf.py # return-code=1
2025-11-18T04:46:07Z CRITICAL - failed/EXIT
clairecarouge
(Claire Carouge, ACCESS-NRI Land Modelling Team Lead)
2
@wghuneke , we have noted the issue with ESM1.6 as well. After some testing, emails and conversations with various people at NCI and others, it all comes down to how the Lustre filesystem, Python and containers interact with each others. Unfortunately, we haven’t been able to come to a fix on this, and we still have to test various ideas to see what works.
This wasn’t a priority because of the low frequency it occurs in ESM1.6. It seems to be happening more often for your CM2 suite. We’ll take that into consideration and let you know if we can find a fix.
I’ve started to experience this issue since last week. The netcdf_conversion process keeps failing on the December files and I get a “Job exceeded resource walltime” error email sent to me. I have tried the fix @wghuneke suggested above but that doesn’t always solve my problem. I can’t retrigger the netcdf_conversion step from the gui because it can’t find the january files (these were successfully converted - see error message below).
FileNotFoundError: [Errno 2] No such file or directory: ‘/scratch/jk72/hd4873/archive/dy193/history/atm/dy193a.pd1018jan’ [FAIL] run_um2netcdf.py # return-code=1 ‘import sitecustomize’ failed; use -v for traceback 2026-04-26T23:18:05Z CRITICAL - failed/EXIT
@clairecarouge was there a fix found, or is there a way to retrospectively convert just the december atmosphere files to netcdf?
@rbeucher this looks like it’s related to that sitecustomize.py we added? It shouldn’t cascade any failures but it looked like something weird might have happened…
The fix I mentioned above also didn’t always stop the issue for me. If the error message is related to walltime, you can try to increase the walltime for the netcdf_conversion step.
The error you get when trying to retrigger is because, as you say, the jan file is already converted. It would be really great if the workflow could be changed so that it doesn’t only look at jan (and then gives an error message) but to check if there are any remaining months to be converted.
A work-around is to convert the remaining files by hand before continuing the simulation. This gets annoying if running a long simulation and then having to deal with the issue multiple times.