I am currently using the edge environment analysis3_edge-25.11, which works well with torch + GPU, but I’m running into issues with xarray,specifically where the failure seems to come from importing netCDF4. For example, ds = xr.open_dataset(nc_path) gives:
The same happens when I simply run import netCDF4, which gives the same import error.
Is it possible to fix the import netCDF4 issue in the edge environments? have no issues when I switch to analysis3; I am currently using analysis3-26.02
We were planning on deprecating the -edge environments soon anyway - is there a particular reason you’re needing to use the edge environment over a regular, non-edge one?
I am running this script to test whether PyTorch can see CUDA. In analysis3-26.02 (and earlier versions), PyTorch can’t see CUDA. The latest environment where PyTorch can see CUDA is analysis3_edge-25.11, but I run into an issue: import netCDF4 throws errors, which then causes import xarray to fail. I’m trying to find an analysis3 environment that has a PyTorch build with CUDA support and also has xarray.
Here’s the simple script:
torch on GPU?
import torch
# Let’s check if CUDA (i.e. a GPU) is supported
if torch.cuda.is_available():
print(“CUDA GPU is available.”) # analysis3_edge-25.11
else:
print(“CUDA GPU not available.”) # analysis3-26.02
I’ve been able to solve for an environment which works with both netcdf and for which torch.gpu.is_available == True.
Unfortunately I had to resort to using a Pixi solver (not a standard conda way of doing things), so extracting the solution for this so we can port it into analysis3 is not going to be super straightforward, but we are at least in a situation were we have a working environment that allows you to use PyTorch with CUDA support and xarray.
I’ll update once we’ve got it into analysis3 - hopefully shouldn’t be more than a couple of days work.
I’ve managed to get an environment built, but due to some solver issues I’ve had to remove tensorflow. It’s not that it can’t actually solve it, but that the solve job times out due to the other cuda library dependency stuff that tensorflow requires - which blows out the solve time and the job times out.
AFAIK, tensorflow is basically deprecated in favour of PyTorch nowadays - is this right & would the ML community generally be happy with us replacing tensorflow with torch?
If people are going to need to keep it hanging around, we definitely can get them both into the same environment, but it’ll take a bit longer as we’ll need to figure out how to prime the solver to speed things up sufficiently.
Many researchers I know, including myself, use PyTorch. How about keeping Tensorflow out for now, and if others think it’s necessary for their work and request it, you could look into getting both?
Thanks Charles for the update. Do you think I should give it some time to see the change? I’ve just tried conda-analysis3-26.02 and 26.03, and that torch is still not seeing cuda? Is there anything particular I need to change in my ARE session:
I’m currently using pyearthtools environment and realised it already has everything I need. I didn’t realise adding CUDA support for PyTorch in the analysis environment would be this complicated. I had always assumed that if PyTorch was available it would naturally come with CUDA support. But as long as pet is available and maintained, the community can use it for ML work.
I’m mentioning this because at some point we had an edge environment with CUDA + PyTorch, and that later stopped. So if adding CUDA to analysis now might be difficult to maintain in future updates, it may not be worth putting the effort into it. I understand a lot of effort has already been put in now, but I’m just a bit worried that with future versions something might change and we’ll end up back with PyTorch without CUDA support.
I’ve actually got a working, re-solved environment analysis3 environment with PyTorch & GPU/CUDA support enabled in it now (so long as nothing very weird happens with the build)! Hopefully it should be available later today.
Funnily enough, the added complexity with trying to solve for a working analysis3 environment containing PyTorch is how we figured out a workflow that let us migrate the environment to 3.12. We then did that in the same hit, and I accidentally broke the cuda support in the process. Something like a region-beta paradox.
Anyhow, now we have a solved environment, we’ll be able to lock in the build process going forwards - yes, maintaining an environment with PyTorch & CUDA support enabled is a little bit harder, but figuring out how to do it has actually forced us to upgrade our strategy & tooling for doing that, so maybe it’ll actually work out to be easier in the long run.. fingers crossed.
I’ll update once I’ve confirmed we’ve fixed GPU support.
Fantastic! I’ve tried it and it’s working perfectly. Glad pushing through the PyTorch/CUDA complexity led to a better workflow going forward. Thanks for the great support