Adding WOA23 to the main catalogue

polinash · 4 February 2026 02:58

Hi all,

I noticed that intake catalogue contains datasets from World Ocean Atlas 2013, however there’s an updated WOA23 on/g/data/av17/access-nri/OM3/woa23. Can WOA23 please be added to the main catalogue? Thanks.

@CharlesTurner

CharlesTurner · 4 February 2026 03:18

Heya Polina,

WOA23 should be in the main catalog: see attached screencap.

If you can’t see it, I think that’s going to be because you’re using an old version of analysis3 - can you let me know what version you are using? Assuming that is the case, it’ll be an easy fix and it should be available to older versions of analysis3 in a few hours.

polinash · 5 February 2026 04:20

Right, if I use new conda, I can see WOA23 in the catalogue. Thanks. FYI, I was on conda 25.11 before. Newer versions produce very noisy warnings when loading packages and starting a dask client. It’s been brought up on Hive before, I think @rbeucher is aware of it.

FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml

CharlesTurner · 5 February 2026 04:43

Thanks for the info.

For posterity, what’s happening is that we’ve now created a reformatted version of the catalog, which uses parquet (a different file format, with a bunch of benefits over csv) rather than csv files. Older versions of the software don’t know how to read this, so I’ve set things up to create separate catalogs - csv and parquet, with the choice of environment dictating which catalog you get.

analysis3/25.11 (and all previous version of the environment) must be hooking into the csv version of the catalog, rather than the newer parquet version.

I’m rebuilding the csv catalog to bring it in line with the parquet one now, and I’ll update the build process to update both catalogs so they don’t get out of sync again.

I’ll update once it’s done & I’ve checked WOA23 is available in the older environments again.

polinash · 5 February 2026 05:15

Thanks for the info, Charles.
Any ideas what’s wrong here? I can’t load variable with .to_dask()..

catalog = intake.cat.access_nri 
woa = catalog.search(name='WOA23')
t_an = woa.search(variable='t_an').to_dask()

Error:

NotImplementedError                       Traceback (most recent call last)Cell In[6], line 1----> 1 t_an = woa.search(variable='t_an').to_dask()      2 t_an

File /g/data/xp65/public/apps/med_conda/envs/analysis3-26.01/lib/python3.11/site-packages/intake/source/base.py:191, in DataSourceBase.to_dask(self)
    189 def to_dask(self):
    190     """Return a dask container for this data source"""
--> 191     raise NotImplementedError

NotImplementedError:

My script’s here /g/data/x77/ps7863/DeepArgo/DeepArgo_scripts/open_woa_ds.ipynb

CharlesTurner · 5 February 2026 05:31

I’m not 100% sure, but I think the offender is this line

catalog = intake.cat.access_nri 
woa = catalog.search(name='WOA23') # HERE
t_an = woa.search(variable='t_an').to_dask()

try:

- woa = catalog.search(name='WOA23')
+ woa = catalog['WOA23']

or

- woa = catalog.search(name='WOA23')
+ woa = catalog.search(name='WOA23').to_source()

Searching the top level catalog (intake.cat.access_nri.search(xyz...)) returns another intake-dataframe-catalog object, which is the container for experiments (AKA ESM-Datastores).
Indexing into the top level catalog intake.cat.access_nri[xyz...], gives the intake-esm source you’re after.
Doing intake.cat.access_nri.search(xyz...).to_source() does the same thing as intake.cat.access_nri[xyz...].

Let me know if that fixes it!

CharlesTurner · 5 February 2026 09:26

Okay, I’ve updated the csv catalog - you should be able to access WOA23 from old analysis3 versions now.

polinash · 5 February 2026 23:36

Thanks Charles!

polinash · 13 February 2026 00:55

Hey Charles, there’s an issue with loading variables from WOA23 datasets in intake. It’s a lot to copy here but I created a notebook demonstrating errors, please check here /g/data/x77/ps7863/DeepArgo/scripts/open_woa_intake.ipynb

What I encountered is that there are 2 file_id’s in each variable but loading all of them using .to_dataset_dict(), as well as refining search by ‘file_id’ give errors..

JFYI, I got what I need from WOA23 by reading .netCDF files directly, so the fix isn’t urgent for me

CharlesTurner · 13 February 2026 05:08

Just had a little prod, I can fix these errors by making the following changes:

- woa.search(variable='t_an', frequency='fx').to_dataset_dict()
+ woa.search(variable='t_an', frequency='fx').to_dataset_dict(xarray_open_kwargs={"decode_times":False})

and likewise in to_dask() calls.

I’m not quite sure what the precise cause of this will be - presumably intake-esm is setting some defaults surrounding date handling that are different to vanilla xarray. I’ll dig into that and come back and let you know (and if the defaults seem silly, we can change them).

EDIT: I’ve just double checked and in what I assume is the xarray script where you opened these files (/g/data/x77/ps7863/DeepArgo/scripts/open_woa_ds.ipynb), you’ve used decode_times=False:

woa = xr.open_mfdataset(
    '/g/data/av17/access-nri/OM3/woa23/woa23_B5C2*.nc', 
    decode_times=False,
)

If I remove this flag:

woa = xr.open_mfdataset(
    '/g/data/av17/access-nri/OM3/woa23/woa23_B5C2*.nc', 
-    decode_times=False,
)

I get the same error that intake throws.

Topic		Replies	Views
ACCESS-NRI Intake catalogue for CM2-025 simulation Earth System help , cosima , model-evaluation	18	214	24 February 2025
Intake loading aice_m as two separate datasets COSIMA help , cosima , access-om2 , intake	1	23	3 July 2026
Issues in loading 'ht' in latest conds envs Ocean help	7	114	20 April 2026
Error when loading an intake variable in new conda env 25.09 Technical help , intake	3	88	4 September 2025
Datastore making - unidentified realm COSIMA	22	262	10 May 2026

Adding WOA23 to the main catalogue

Related topics