Rechunking large data at constant memory in Dask

(Re-posting from the Pangeo discourse)

With release 2023.3.1, dask offers a new method for rechunking arrays called p2p that rechunks large arrays at constant memory. This enables workloads that were previously un-runnable and required you to rechunk datasets ahead of time using tools like rechunker.

See Rechunking large data at constant memory in Dask [experimental] - HPC - Pangeo

1 Like

Very interesting! Seems quite sensitive to the number of tasks, hence the chunk size.

Llike almost everything in xarray/dask-land it’s all about tha chunks, 'bout tha chunks, hey? (apologies to Meghan Trainor)