Also see my post from just last week here, indicating the following resources:
- Parallel computing with Dask - in particular see Optimization tips at the bottom. This contains some suggestions on how to get dask to play nicely with operations such as
groupby(for subtracting climatologies). - Best Practices — Dask documentation - best practices for dask.
A few other bits and pieces:
A key point here is that if chunks is not specified, no chunking will be done (if open_mfdataset is used then the chunk size will be the file size).
You can see the native chunking of variables in a netcdf file using ncdump -hs <filename>.
I’ve found it useful and efficient to save some intermediate results that are expensive to calculate to .zarr stores using a for loop over sections of the datasets (.isel) and appending to the file as described here.