Xarray to_zarr causing errors in new xp65 env that weren't present in hh5

Dear Hive,

I am using xarray to_zarr utility.

Here is the fragment of code:

data_full = xr.open_mfdataset(file_list, preprocess = preprocess_ds, parallel = True).persist()

data_stacked = data_full[all_vars].to_array(dim = "s_vars")
data_ds = xr.Dataset({"data": data_stacked}).chunk(time = -1, s_vars = -1, lat = chunk_size, lon = chunk_size).persist()

data_ds.to_zarr(chunkdir + f'{year_start}-{year_end}-{"-".join(all_vars)}', consolidated = True, mode = 'w')

The error I am getting is:

Traceback (most recent call last):
  File "/g/data/mn51/users/jp0715/acs/bias_correction/mrnbc/code/pymrnbc/mrnbc-python-wrapper/acs_chunk_mrnbc.py", line 153, in <module>
    data_ds.to_zarr(chunkdir + f'{year_start}-{year_end}-{"-".join(all_vars)}', consolidated = True, mode = 'w')
  File "/g/data/xp65/public/apps/med_conda/envs/analysis3-25.04/lib/python3.11/site-packages/xarray/core/dataset.py", line 2270, in to_zarr
    return to_zarr(  # type: ignore[call-overload,misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

and this:

  File "/g/data/xp65/public/apps/med_conda/envs/analysis3-25.04/lib/python3.11/site-packages/dask/array/core.py", line 4614, in load_store_chunk
    if x is not None and x.size != 0:
  ^^^^^^^^^^^^^^^^^
AttributeError: 'tuple' object has no attribute 'size'

We were using hh5 analysis-24.12. I have tried a few different xp65 envs including: 24.12, 25.04 and 25.05. The error messages were slightly different (from the second fragment I sent), however all occurred in the to_zarr call.

Thanks for posting @jpeter. Are you hoping to get help from ACCESS-NRI with this? Adding the help tag to your post will mean it gets triaged by our support team.

See Support FAQ (Frequently Asked Questions)

If you are after support, it would be very helpful if you could provided a minimal reproducible example.

See How to create a minimal complete reproducible example

1 Like

Hi Justin,
I managed to duplicate this on ARE with this test notebook using a small dataset

I used conda/analysis3 and ran in kernel 3-24-07

The issue wasn’t happening on my local machine with latest xarray and dask installed but if I downgraded the packages to match the conda env above as follows

xarray==2024.11.0
dask==2024.12.0
distributed==2024.12.0

I can now also duplicate it there… so I can’t help with the cause but it seems there is an issue that has been fixed at some point. I’ll see if I can find out when these updates will make it into analysis3.

2 Likes

Hi Owen,

Thanks for your reply. We managed to get the code to work by removing the persist statements. I have no idea why that now worked compared to the hh5 analysis environment, however, it appears to be working now.

Cheers,
Justin

Hi Justin, good to know you’ve got a workaround. Since the 2024.12.0 version of dask still seems to have an issue I’ll post an update here when analysis3 includes a newer version, not sure when that will be yet.

1 Like