Unable to find variable in intake

Hello,

I am trying to load temperature in the RYF and IAF run here:

print(cat[‘01deg_jra55v13_ryf9091’].search(variable=‘temp’, frequency=‘1mon’).keys())
print(cat[‘01deg_jra55v140_iaf_cycle4’].search(variable=‘temp’, frequency=‘1mon’).keys())

but I don’t get temperature for the IAF run!
For the RYF run the keys are:
[‘ocean.1mon.grid_xt_ocean:3600.grid_xu_ocean:3600.grid_yt_ocean:2700.grid_yu_ocean:2700.neutral:80.neutralrho_edges:81.nv:2.potrho:80.potrho_edges:81.st_edges_ocean:76.st_ocean:75.sw_edges_ocean:76.sw_ocean:75.xt_ocean:3600.xu_ocean:3600.yt_ocean:2700.yu_ocean:2700’]

But for the IAF run the keys are:
[‘ocean.1mon.nv:2.st_edges_ocean:76.st_ocean:75.xt_ocean:3600.yt_ocean:2700’]

This was working 2 days ago. Have I missed a memo somewhere?

Hey Taimoor,

We haven’t changed anything in the past couple of days in the intake catalog (at least to my knowledge), so the most likely culprit is the underlying files changing or moving.

I’ll take a proper look and get back to you.

2 Likes

Thanks Charles, let me know what you find, if anything!

Sorry it took me a couple of days to get back - I think this is just an issue with the .keys() attribute being fairly unintuitive.

For the first dataset:

>>> print(
    cat['01deg_jra55v13_ryf9091'].search(
        variable='temp', frequency='1mon'
    ).to_dask()
)
<xarray.Dataset> Size: 8TB
Dimensions:   (time: 2760, st_ocean: 75, yt_ocean: 2700, xt_ocean: 3600)
Coordinates:
  * xt_ocean  (xt_ocean) float64 29kB -279.9 -279.8 -279.7 ... 79.75 79.85 79.95
  * yt_ocean  (yt_ocean) float64 22kB -81.11 -81.07 -81.02 ... 89.89 89.94 89.98
  * st_ocean  (st_ocean) float64 600B 0.5413 1.681 2.94 ... 5.511e+03 5.709e+03
  * time      (time) object 22kB 1950-01-16 12:00:00 ... 2179-12-16 12:00:00
Data variables:
    temp      (time, st_ocean, yt_ocean, xt_ocean) float32 8TB dask.array<chunksize=(1, 7, 300, 400), meta=np.ndarray>
Attributes:
    filename:                        ocean.nc
    title:                           ACCESS-OM2-01
    grid_type:                       mosaic
    grid_tile:                       1
    intake_esm_vars:                 ['temp']
    intake_esm_attrs:filename:       ocean.nc
    intake_esm_attrs:file_id:        ocean.1mon.grid_xt_ocean:3600.grid_xu_oc...
    intake_esm_attrs:frequency:      1mon
    intake_esm_attrs:realm:          ocean
    intake_esm_attrs:_data_format_:  netcdf
    intake_esm_dataset_key:          ocean.1mon.grid_xt_ocean:3600.grid_xu_oc...

For the second:

>>> print(
    cat['01deg_jra55v140_iaf_cycle4'].search(
        variable='temp', frequency='1mon'
    ).to_dask()
)
<xarray.Dataset> Size: 2TB
Dimensions:   (time: 732, st_ocean: 75, yt_ocean: 2700, xt_ocean: 3600)
Coordinates:
  * xt_ocean  (xt_ocean) float64 29kB -279.9 -279.8 -279.7 ... 79.75 79.85 79.95
  * yt_ocean  (yt_ocean) float64 22kB -81.11 -81.07 -81.02 ... 89.89 89.94 89.98
  * st_ocean  (st_ocean) float64 600B 0.5413 1.681 2.94 ... 5.511e+03 5.709e+03
  * time      (time) datetime64[ns] 6kB 1958-01-16T12:00:00 ... 2018-12-16T12...
Data variables:
    temp      (time, st_ocean, yt_ocean, xt_ocean) float32 2TB dask.array<chunksize=(1, 19, 135, 180), meta=np.ndarray>
Attributes: (12/14)
    title:                                    ACCESS-OM2-01
    grid_type:                                mosaic
    grid_tile:                                1
    intake_esm_vars:                          ['temp']
    intake_esm_attrs:file_id:                 ocean.1mon.nv:2.st_edges_ocean:...
    intake_esm_attrs:frequency:               1mon
    ...                                       ...
    intake_esm_attrs:variable_standard_name:  ,,,,,,sea_water_conservative_te...
    intake_esm_attrs:variable_cell_methods:   ,,,,,,time: mean,,,,
    intake_esm_attrs:variable_units:          days,days since 1900-01-01 00:0...
    intake_esm_attrs:realm:                   ocean
    intake_esm_attrs:_data_format_:           netcdf
    intake_esm_dataset_key:                   ocean.1mon.nv:2.st_edges_ocean:...

I think the issue is that you’ve just forgotten to call .to_dask()?

The .keys() attribute of an ESM-Datastore represent the groupings on which we turn it into an xarray dataset, not the variables within the datasets themselves. Currently, they’re a bit of a mess as they’re procedurally generated from the coordinates of the files, which is something we’re working on making less confusing.

Let me know if that helps! If not, we might need to do a bit more digging - I’m mostly back on top of everything now, so I should be a bit more organised getting back to you!

Thanks @CharlesTurner - it magically started working again… not sure what happened except gadi has been hicupping a bit recently.