Efficient daily resampling using xarray

dougiesquire · 6 December 2022 02:47

@Michael_Barnes, it looks like you’re happy with the cdo solution, which is great. But, just to add to my previous answer, using dask and appropriate chunking will help here (sorry, I didn’t originally realize the size of your input data).

For example, your initial code took me approximately 12 mins to run using a “Large” ARE instance (7 cpus, 32GB mem). The following takes 1.5 mins:

import xarray as xr
from distributed import Client
Client()

tfile = "/g/data/rt52/era5/pressure-levels/reanalysis/t/2022/t_era5_oper_pl_20220301-20220331.nc"
hourly = xr.open_dataset(tfile, chunks={"time": 24, "level": 2})
daily = hourly.coarsen(time=24).mean(keep_attrs=True).compute()

Is this competitive with using cdo?

Note, using resample instead of coarsen above takes approximately 3 mins.

Topic		Replies	Views
2nd May 2025 - Unlocking the Power of Xarray and Dask 2025 training program	7	178	15 May 2025
Improving performance of a Python-based function General python , help , outofscope , vectorisation	13	363	29 October 2023
A reference for making dask work (faster) Technical dask , knowledge-base	11	1314	22 July 2025
ERA5 Daily data Technical help , inscope , duplicate	15	404	7 April 2025
How to efficiently chunk data for faster processing and plotting? Technical python , cosima , access-om2	5	176	29 September 2024

Efficient daily resampling using xarray

Related topics