A reference for making dask work (faster)

rmholmes · 11 May 2023 22:07

Also see my post from just last week here, indicating the following resources:

Parallel computing with Dask - in particular see Optimization tips at the bottom. This contains some suggestions on how to get dask to play nicely with operations such as groupby (for subtracting climatologies).
Best Practices — Dask documentation - best practices for dask.

A few other bits and pieces:

A key point here is that if chunks is not specified, no chunking will be done (if open_mfdataset is used then the chunk size will be the file size).

You can see the native chunking of variables in a netcdf file using ncdump -hs <filename>.

I’ve found it useful and efficient to save some intermediate results that are expensive to calculate to .zarr stores using a for loop over sections of the datasets (.isel) and appending to the file as described here.

Topic		Replies	Views
How to efficiently chunk data for faster processing and plotting? Technical python , cosima , access-om2	5	119	29 September 2024
Presentation on dask and xarray Technical	2	382	9 May 2023
Latest version of dask (2022.11.0) could fix many workflow issues Technical python , dask	6	422	5 January 2023
2nd May 2025 - Unlocking the Power of Xarray and Dask 2025 training program	5	94	12 May 2025
Dask remove time chunks for Fourier transforms Technical dask , xarray , chunking	9	360	30 August 2023

A reference for making dask work (faster)

Related topics