The latest version of dask (2022.11.0) uses a new mode of scheduling by default that can significantly reduce the memory usage in a lot of typical geoscience workflows. The new default tries to address the problem of “root task overproduction” - if you’ve ever inexplicably run out of memory while computing a climatology, for example, you were probably experiencing root task overproduction. Details here: Reducing memory usage in Dask workloads by 80%.
It would be great to explore/benchmark any improvements from this to COSIMA workflows as part of the COSIMA hackathon.
Nice! I’d be really interested to see how this pans out. It sounds like there’s the potential for some slight slowdown as a tradeoff for the reduction in memory usage, but also that it can speed things up by not spilling to disk.
Good idea. I like the idea of bench-marking the versions to see how that plays out.
Aidan
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
4
I have started an issue to get this installed in the hh5 conda environment
1 Like
Aidan
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
5
Latest dask version is now available in conda/analysis3-unstable:
$ conda list dask
# packages in environment at /g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.07:
#
# Name Version Build Channel
dask 2022.11.1 pyhd8ed1ab_0 conda-forge