### What happened?
I have a use case of computing a function in parallel with… `dask` and inside the function I am opening/reading (no writing) the same `.nc` dataset in all those parallel calls to do computation based on it. I am getting a bunch (see the log) of different errors (or none) while running it. Most likely it's not designed to be used in this way, or it's an actual bug in either IO implementation or `dask` itself.
Due to non-deterministic nature you should run the MVE with `while python crash.py; do :; done`
### What did you expect to happen?
I would expect it to fail gracefully, maybe warning about dangerous multi-thread reads.
### Minimal Complete Verifiable Example
```Python
import xarray as xr
import numpy as np
from dask import delayed, compute
# create dummy.nc
dummy_data = xr.Dataset(
{
"var": (("x", "y"), np.random.rand(10, 10)),
}
)
dummy_data = dummy_data.chunk({"x": 5, "y": 5}) # not necessary for reproducing...
dummy_data.to_netcdf("dummy.nc")
del dummy_data
def add_columns(value):
ds = xr.open_dataset(
"dummy.nc"
)
value *= np.random.choice(ds["var"].values.reshape(-1))
value *= np.random.choice(ds["var"].values.reshape(-1))
return value
res = []
for _ in range(10):
res.append(delayed(add_columns)(np.random.rand(1)))
the_ans = compute(*res)
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
```Python
Error log 1
python3: /io/netcdf-c-4.9.2/libsrc4/nc4internal.c:426: nc4_find_nc_grp_h5: Assertion `my_h5 && my_h5->root_grp' failed.
Aborted (core dumped)
Error log 2
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 1:
#000: H5A.c line 1327 in H5Aiterate2(): invalid location identifier
major: Invalid arguments to routine
minor: Inappropriate type
#001: H5VLint.c line 1749 in H5VL_vol_object(): invalid identifier type to function
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 2:
#000: H5G.c line 548 in H5Gget_create_plist(): not a group ID
major: Invalid arguments to routine
minor: Inappropriate type
Segmentation fault (core dumped)
Error log 3
Traceback (most recent call last):
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/core/dataset.py", line 1446, in _construct_dataarray
variable = self._variables[name]
KeyError: 'var'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/core/dataset.py", line 1545, in __getitem__
return self._construct_dataarray(key)
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/core/dataset.py", line 1448, in _construct_dataarray
_, name, variable = _get_virtual_variable(self._variables, name, self.sizes)
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/core/dataset.py", line 207, in _get_virtual_variable
raise KeyError(key)
KeyError: 'var'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ernie/code/corres/crash_netcdf5.py", line 29, in <module>
the_ans = compute(*res)
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/dask/base.py", line 661, in compute
results = schedule(dsk, keys, **kwargs)
File "/home/ernie/code/corres/crash_netcdf5.py", line 21, in add_columns
value *= np.random.choice(ds["var"].values.reshape(-1))
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/core/dataset.py", line 1547, in __getitem__
raise KeyError(
KeyError: "No variable named 'var'. Variables on the dataset include []"
Error log 4
Traceback (most recent call last):
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/core/utils.py", line 635, in close_on_error
yield
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 660, in open_dataset
ds = store_entrypoint.open_dataset(
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/backends/store.py", line 43, in open_dataset
vars, attrs = filename_or_obj.load()
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/backends/common.py", line 251, in load
(_decode_variable_name(k), v) for k, v in self.get_variables().items()
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 462, in get_variables
return FrozenDict(
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/core/utils.py", line 443, in FrozenDict
return Frozen(dict(*args, **kwargs))
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 463, in <genexpr>
(k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items()
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 424, in open_store_variable
attributes = {k: var.getncattr(k) for k in var.ncattrs()}
File "src/netCDF4/_netCDF4.pyx", line 4585, in netCDF4._netCDF4.Variable.ncattrs
File "src/netCDF4/_netCDF4.pyx", line 1521, in netCDF4._netCDF4._get_att_names
File "src/netCDF4/_netCDF4.pyx", line 2034, in netCDF4._netCDF4._ensure_nc_success
AttributeError: NetCDF: Not a valid ID
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ernie/code/corres/crash_netcdf5.py", line 29, in <module>
the_ans = compute(*res)
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/dask/base.py", line 661, in compute
results = schedule(dsk, keys, **kwargs)
File "/home/ernie/code/corres/crash_netcdf5.py", line 17, in add_columns
ds = xr.open_dataset(
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/backends/api.py", line 573, in open_dataset
backend_ds = backend.open_dataset(
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 659, in open_dataset
with close_on_error(store):
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/core/utils.py", line 637, in close_on_error
f.close()
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 578, in close
self._manager.close(**kwargs)
File "/home/ernie/.miniconda/envs/corres/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 234, in close
file.close()
File "src/netCDF4/_netCDF4.pyx", line 2627, in netCDF4._netCDF4.Dataset.close
File "src/netCDF4/_netCDF4.pyx", line 2590, in netCDF4._netCDF4.Dataset._close
File "src/netCDF4/_netCDF4.pyx", line 2034, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Not a valid ID
```
### Anything else we need to know?
_No response_
### Environment
<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.5.0-35-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development
xarray: 2024.3.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.0
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.11.0
Nio: None
zarr: 2.17.2
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.4.2
distributed: 2024.4.2
matplotlib: 3.8.4
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.3.1
cupy: None
pint: None
sparse: None
flox: 0.9.7
numpy_groupies: 0.11.1
setuptools: 69.5.1
pip: 24.0
conda: None
pytest: 7.4.0
mypy: None
IPython: 8.22.2
sphinx: None
</details>