Improving performance of a Python-based function

One solution is to use dask, which can reduce memory usage by cutting the job into a number of smaller tasks that are run in parallel.

There is an existing topic about improving the performance of xarray and dask that might be helpful.