Analysis3-edge, netCDF4 & PyTorch CUDA

Hi all,

I am currently using the edge environment analysis3_edge-25.11, which works well with torch + GPU, but I’m running into issues with xarray,specifically where the failure seems to come from importing netCDF4. For example, ds = xr.open_dataset(nc_path) gives:

File /g/data/xp65/public/apps/med_conda/envs/analysis3_edge-25.11/lib/python3.11/site-packages/netCDF4/init.py:3

1 # init for netCDF4. package

2 # Docstring comes from extension module _netCDF4.

----> 3 from ._netCDF4 import *

4 # Need explicit imports for names beginning with underscores

5 from ._netCDF4 import doc

ImportError: /g/data/xp65/public/apps/med_conda/envs/analysis3_edge-25.11/lib/python3.11/site-packages/netCDF4/../../../././libucc.so.1: undefined symbol: ucs_config_doc_nop

The same happens when I simply run import netCDF4, which gives the same import error.

Is it possible to fix the import netCDF4 issue in the edge environments? have no issues when I switch to analysis3; I am currently using analysis3-26.02

Thanks,

Sanaa

Hey Sanaa,

We were planning on deprecating the -edge environments soon anyway - is there a particular reason you’re needing to use the edge environment over a regular, non-edge one?

Cheers, Charles

Hi Charles,

I am running this script to test whether PyTorch can see CUDA. In analysis3-26.02 (and earlier versions), PyTorch can’t see CUDA. The latest environment where PyTorch can see CUDA is analysis3_edge-25.11, but I run into an issue: import netCDF4 throws errors, which then causes import xarray to fail. I’m trying to find an analysis3 environment that has a PyTorch build with CUDA support and also has xarray.

Here’s the simple script:

torch on GPU?

import torch

# Let’s check if CUDA (i.e. a GPU) is supported

if torch.cuda.is_available():

print(“CUDA GPU is available.”) # analysis3_edge-25.11

else:

print(“CUDA GPU not available.”) # analysis3-26.02

Thanks!

Hi Sanaa,

I’ve moved this over to a new topic as it’s a slightly different issue to the previous one.

I’ll have a dig and see if I can figure out what’s going on.

cc @rbeucher

Hi Sanaa,

I’ve been able to solve for an environment which works with both netcdf and for which torch.gpu.is_available == True.

Unfortunately I had to resort to using a Pixi solver (not a standard conda way of doing things), so extracting the solution for this so we can port it into analysis3 is not going to be super straightforward, but we are at least in a situation were we have a working environment that allows you to use PyTorch with CUDA support and xarray.

I’ll update once we’ve got it into analysis3 - hopefully shouldn’t be more than a couple of days work.

Thank you Charles. I appreciate the effort.

Hey Sanaa,

I’ve managed to get an environment built, but due to some solver issues I’ve had to remove tensorflow. It’s not that it can’t actually solve it, but that the solve job times out due to the other cuda library dependency stuff that tensorflow requires - which blows out the solve time and the job times out.

AFAIK, tensorflow is basically deprecated in favour of PyTorch nowadays - is this right & would the ML community generally be happy with us replacing tensorflow with torch?

If people are going to need to keep it hanging around, we definitely can get them both into the same environment, but it’ll take a bit longer as we’ll need to figure out how to prime the solver to speed things up sufficiently.

Many researchers I know, including myself, use PyTorch. How about keeping Tensorflow out for now, and if others think it’s necessary for their work and request it, you could look into getting both?

Hi Sanaa,

Sorry for the delay at our end here - this led us down a bit of a rabbit hole & we had to upgrade a few parts of how we maintain analysis3.

Once we release the new version of conda/analysis3-26.02, both tensorflow and torch should be available in it.

Hopefully within the hour.

Spoke too soon!

After another day of banging our heads against the wall, I think everything should be working now.

  • conda-analysis3-26.02 currently has torch=2.6.0 available in a Python 3.11 environment.
  • conda-analysis3-26.03 currently has torch=2.5.1post8 available in a Python 3.12 environment.

Both still have tensorflow=2.18.0.

Thanks Charles for the update. Do you think I should give it some time to see the change? I’ve just tried conda-analysis3-26.02 and 26.03, and that torch is still not seeing cuda? Is there anything particular I need to change in my ARE session:

ARE session

Thanks,

Sanaa

1 Like

You could try the pyearthtools environment, which has torch and xarray -

Module directory: /g/data/dk92/apps/Modules/modulefiles
Module: pet/2025.08

Ah… I think I know what’s happened here.

Lots of activity in analysis3 this week so some stuff has gotten a bit mangled. Taking a look now.

I’m currently using pyearthtools environment and realised it already has everything I need. I didn’t realise adding CUDA support for PyTorch in the analysis environment would be this complicated. I had always assumed that if PyTorch was available it would naturally come with CUDA support. But as long as pet is available and maintained, the community can use it for ML work.

I’m mentioning this because at some point we had an edge environment with CUDA + PyTorch, and that later stopped. So if adding CUDA to analysis now might be difficult to maintain in future updates, it may not be worth putting the effort into it. I understand a lot of effort has already been put in now, but I’m just a bit worried that with future versions something might change and we’ll end up back with PyTorch without CUDA support.

I’ve actually got a working, re-solved environment analysis3 environment with PyTorch & GPU/CUDA support enabled in it now (so long as nothing very weird happens with the build)! Hopefully it should be available later today.

Funnily enough, the added complexity with trying to solve for a working analysis3 environment containing PyTorch is how we figured out a workflow that let us migrate the environment to 3.12. We then did that in the same hit, and I accidentally broke the cuda support in the process. Something like a region-beta paradox.

Anyhow, now we have a solved environment, we’ll be able to lock in the build process going forwards - yes, maintaining an environment with PyTorch & CUDA support enabled is a little bit harder, but figuring out how to do it has actually forced us to upgrade our strategy & tooling for doing that, so maybe it’ll actually work out to be easier in the long run.. fingers crossed.

I’ll update once I’ve confirmed we’ve fixed GPU support.

Okay, I’ve double checked against a live environment and PyTorch should now be working with CUDA correctly:

Please let me know if you run into any issues!

1 Like

Fantastic! I’ve tried it and it’s working perfectly. Glad pushing through the PyTorch/CUDA complexity led to a better workflow going forward. Thanks for the great support :star_struck:

2 Likes