Xp65 conda/analysis3 environment + regional_mom6 versioning

Hey all, I’m trying to run the regional mom6 package after a break. Using this notebook based off the regional_mom6 demo. It’s failing on the bathymetry creation step with error message

ValueError: chunks must have the same number of elements as dimensions. Expected 4 elements, got 2.

I seem to get the same error with analysis3-25.03, -25.04 and -unstable
Regional mom6 version is 0.6.1

I’ve run this notebook before without issue so I thought it might be to do with the conda environment, but I’m not 100% sure what I was using before.

Any ideas?

1 Like

From following the stack trace error from above for regional-mom6 v0.6.1 used, it seems like the error is coming from the xesmf regridder:

Do we know what version of xesfm we have in conda analysis 25.04 compared to the hh5 env 24.07? cc @rbeucher @anton

@mmr0 did you try running with the hh5 env conga-analysis 24.07? Is this OK?

1 Like

@CharlesTurner updated 25.04 yesterday. @CharlesTurner can you check the versions for us please?

1 Like
[ct1163@gadi-login-04 module use /g/data/xp65/public/modules && module load conda/analysis3-25.04
Loading conda/analysis3-25.04
  Loading requirement: singularity
[ct1163@gadi-login-04 ~]$ python -c "import xesmf; print(xesmf.__version__)"
0.8.7
[ct1163@gadi-login-04 ~]$ module unload conda/analysis3-25.04
Unloading conda/analysis3-25.04
  Unloading useless requirement: singularity
[ct1163@gadi-login-04 ~]$ module use /g/data/hh5/public/modules && module load conda/analysis3
Loading conda/analysis3-25.03
  Loading requirement: singularity
[ct1163@gadi-login-04 ~]$ python -c "import xesmf; print(xesmf.__version__)"
0.8.8
[ct1163@gadi-login-04 ~]$ module unload conda/analysis3
Unloading conda/analysis3-25.03
  Unloading useless requirement: singularity
[ct1163@gadi-login-04 ~]$ module load conda/analysis3-24.07
Loading conda/analysis3-24.07
  Loading requirement: singularity
[ct1163@gadi-login-04 ~]$ python -c "import xesmf; print(xesmf.__version__)"
0.8.8
[ct1163@gadi-login-04 ~]$ 

Looks like xesmf has downgraded from 0.8.8 to 0.8.7. The changelog for 0.8.7 => 0.8.8 does involve some changes to chunking.

I’m looking into what might have caused the version to downgrade now.

1 Like

Thanks Navid! Yes I just checked and running it with hh5 env conda-analysis 24.07 solves that problem

Okay, I’ve done some digging and around version 0.8.8 of xesmf, a bunch of version incompatibility pins with esmpy were added.

I haven’t quite pinpointed what this conflicts with, but it seems to be the source of the error. I’ll update when I have more findings.

Hi Madi,

I’ve just been working through some related xesmf issues & bisecting the environment changes.

The conda/analysis3-25.02 & conda/analysis3-25.03 environments contains xesmf 0.8.8, only downgrading to 0.8.7 in analysis3-25.04:

[ct1163@gadi-login-04 ~] $ module use /g/data/xp65/public/modules && module load conda/analysis3-25.02
Loading conda/analysis3-25.02
  Loading requirement: singularity
[ct1163@gadi-login-04 ~]$ pip list | grep 'xesmf'
xesmf                         0.8.8
[ct1163@gadi-login-04 ~]$ module unload conda/analysis3-25.02 
Unloading conda/analysis3-25.02
  Unloading useless requirement: singularity
[ct1163@gadi-login-04 ~]$
[ct1163@gadi-login-04 ~]$ module load conda/analysis3-25.03
Loading conda/analysis3-25.03
  Loading requirement: singularity
[ct1163@gadi-login-04 ~]$ pip list | grep 'xesmf'
xesmf                         0.8.8
[ct1163@gadi-login-04 ~]$ module unload conda/analysis3-25.03
Unloading conda/analysis3-25.03
  Unloading useless requirement: singularity
[ct1163@gadi-login-04 ~]$ 
[ct1163@gadi-login-04 ~]$ module load conda/analysis3-25.04
Loading conda/analysis3-25.04
  Loading requirement: singularity
[ct1163@gadi-login-04 ~]$ pip list | grep 'xesmf'
xesmf                         0.8.7

Assuming that the xesmf downgrade was the root cause of the issue, I would have expected the notebook to run in the 25.02 & 25.03 environments, and fail in the 25.04 environment, so I suspect the issue might be a little bit more complicated.

I’m currently downloading the copernicus data needed in the notebook which is taking a while… more info to come.

Hi Madi,

I’ve just run this in the conda/analysis3-25.02 environment sucessfully & reproduced the error you were getting with conda/analysis3-25.04.

The conda/analysis3-25.03 environment produces this segmentation fault:

[gadi-cpu-bdw-0102:1663307:0:1663307] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x20)
BFD: Dwarf Error: Can't find .debug_ranges section.
...

I can’t remember the details but I think @anton was running into these BFD Dwarf Errors somewhere too.

For now, are you able to use 25.02 or lower? Working out how to fix these environment conflicts can be a slow process.

Hi Charles,

Thanks so much for looking into this. I didn’t notice that -25.03 produced a different error. Oops!

Yes I have just successfully run the notebook using -25.02. No errors! Very happy using that for the moment. If you want me to test other versions at a later stage, I’m happy to.

Thanks!

@helen @navidcy I’d like to help others avoid this issue. In a sense, the current instructions are OK since they point to the Cosima Cookbook which still recommends hh5 analysis3, which works. It might be good to note somewhere that xp65 analysis3-25.02 also works, though? And to avoid later versions for the moment?

2 Likes

I have put a link to this thread on the instructions post.
When this is fixed, we should update the Cosima Cookbook to point to the Xp65 analysis3!