All netcdf_conversion jobs failed from last Saturday

qinggangg · 31 March 2026 03:59

Hi Team

All my netcdf_conversion jobs within ACCESS-AM3 alpha release run failed since last Saturday. Before that time, all jobs succeeded. I am wondering whether there are changes to the gadi python environment, or because too many jobs occupied the python.

It seems netcdf_conversion does not start its job until exceeding walltime.

You can find logs to a failed job here /scratch/public/qg8515/jobf.out, and logs to a succeeded job here /scratch/public/qg8515/jobs.out

Any insights would be appreciated.

Regards, Qinggang

atteggiani · 1 April 2026 02:51

Hi @qinggangg,

can you please also share the error log for the failed run?

~~Also, the /scratch/public/qg8515/jobs.out path doesn’t seem to exist~~

EDIT: Found! It’s at /scratch/public/qg8515/.jobs.out

lachlanswhyborn · 1 April 2026 03:24

Hi @qinggangg, which branch of the configurations were you using as your start point for this experiment? I can’t replicate your errors from the current dev-n96e branch.

Would you be able to show what additions you’ve made to your .bash_profile? The PATH in jobf.out looks a bit odd. There are some miniconda paths in there, which could be causing conflicts with the um2netcdf4 environment.

qinggangg · 1 April 2026 03:31

Hi @atteggiani Thank you. I copied the succeeded and failed job.err files to the folder /scratch/public/qg8515 as well.

qinggangg · 1 April 2026 03:34

Hi @lachlanswhyborn I added the following two lines in .bashrc files. But it was done a long while ago rather than last Saturday.

module load ncview
module load netcdf

I can also share the whole settings in .bashrc but they are not modified recently and are mainly alias and export commands.

There is also a conda section:

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/home/563/qg8515/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/home/563/qg8515/miniconda3/etc/profile.d/conda.sh" ]; then
        . "/home/563/qg8515/miniconda3/etc/profile.d/conda.sh"
    else
        export PATH="/home/563/qg8515/miniconda3/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<

qinggangg · 1 April 2026 03:37

Would it also be easy to do netcdf_conversion offline?

qinggangg · 1 April 2026 03:45

I created my own branch from dev-n96e in a forked repository. I added many output variables which makes the netcdf_conversion job very memory heavy.

lachlanswhyborn · 1 April 2026 04:06

Hmm, the netcdf_conversion job normally takes on the order of a few minutes, so it seems unlikely that any reasonable amount of added variables would cause it to take 4 hours. I see you added a couple more log files to that shared scratch directory, which suggest the netcdf conversion was successful- did you make any changes to achieve this?

qinggangg · 1 April 2026 04:07

The successful netcdf_conversion jobs were run before last Saturday, which took around 2.5 hours. From last Saturday, all netcdf_conversion jobs failed by exceeding walltime.

lachlanswhyborn · 1 April 2026 05:04

The version of conda/analysis is different between runs- the successful run used conda/analysis-25.08, while the failed run used conda/analysis3-26.01. I know that ants was removed from the conda environments for all past 25.08, as it was placing untenable restrictions on other package versions.

The conda environment is actually loaded when loading pythonlib/um2netcdf4/xp65. This modulefile was updated on Friday afternoon, which lines up with the behaviour you’re seeing. We’ll do some investigation internally to see if we can work out if it actually this change in version causing the problem, and if there’s a way around it.

In the meantime, a temporary workaround may be to add module use /g/data/xp65/public/modules and module load conda/analysis3-25.08 to the NetCDF conversion task in suite.rc, before the module load pythonlib/um2netcdf4/xp65 line. The um2netcdf4 only loads the default conda/analysis if a version of it is not already loaded, so by loading a specific version, you should be able to restore the previous behaviour.

qinggangg · 1 April 2026 05:10

Thank you. I will check now.

qinggangg · 1 April 2026 11:38

Hi @lachlanswhyborn Thank you for the suggestion, but the job failed again by exceeding walltime. I copied the failed and succeeded logs here /scratch/public/qg8515 in netcdf_conversionf and netcdf_conversions for comparison.

lachlanswhyborn · 1 April 2026 22:32

Looks like it still used conda/analysis-26.01, so it must be checking for that specific module version rather than any version. Can you try:

Removing the current contents of pre-script in the [[netcdf_conversion]] task, and just have module use /g/data/xp65/public/modules and module load conda/analysis-25.08, so we load the specific version of conda/analysis.
Copy the files um2netcdf4.py and stashvar_cmip6.py from /g/data/access/apps/pythonlib/um2netcdf4/2.1 into the app/netcdf_conversion/file directory in the configuration. The other thing the original um2netcdf4 was add this directory to the PYTHONPATH, but we can just bypass this by adding them to the working path.
Re-run the NetCDF conversion task.

If it still uses conda/analysis-26.01, then I’ll be very confused and will have to call for some backup.

qinggangg · 2 April 2026 00:39

I’ll check now.

lachlanswhyborn · 2 April 2026 07:23

This should now be fixed on the default configuration, with a reversion of the um2netcdf4/xp65 modulefile.

qinggangg · 2 April 2026 23:09

Thank you. This issue is fixed for me now.

Topic		Replies	Views
AM3 NetCDF conversion failing- temporary fix AM3	1	40	2 April 2026
Reporting ESM1.6 netCDF conversion failures CMIP7 development	8	122	17 November 2025
Netcdf_conversion often fails since switching to xp65 Coupled Model python , container , unresolved	5	124	28 April 2026
Netcdf conversion failure at postprocess stage in ACCESS-CM2 Coupled Model help , help-needed	1	46	17 November 2025
Um2netcdf4 not working after maintenance (change to /g/data3 alias) Technical python , help	2	347	3 August 2023

All netcdf_conversion jobs failed from last Saturday

Related topics