Python garbage collection seems to be failing me?

anton · 16 August 2024 06:22

I am trying to write a notebook about handling large data and found an interesting quirk …

I have one dataset, with very small chunks, I combine the chunks using an xarray chunk function and then calculate a mean. I am finding that when I do this multiple times consecutively (or in different combinations) it just fills up the RAM on my ARE instance which doesn’t get deleted automatically. Eventually the kernel just crashes (even though the stuff filling RAM is not explicitly held in variables or anything). Adding lots of gc.collect() calls fixes the problem, but is definitely not what I expected to have to do?

The notebook is here: training-day-2024-find-analyse-data/intake/Large_data.ipynb at main · ACCESS-NRI/training-day-2024-find-analyse-data · GitHub

The dataset:

New chunks:

Plot with large garbage collection number:

When I do the last two steps with different combinations of (sensible) chunksizes, it crashes the kernel. See
The notebook is here: training-day-2024-find-analyse-data/intake/Large_data.ipynb at main · ACCESS-NRI/training-day-2024-find-analyse-data · GitHub , which will crash without the gc.collection calls …

adele-morrison · 16 August 2024 07:17

Yep I think this is a pretty common problem.

angus-g · 19 August 2024 04:05

Could the Out[] object in ipython/jupyter be to blame? Although the results aren’t specifically kept by name, they still exist in that object (i.e. Out[30] in your screenshot).

You may be forcing lots of intermediate data to be cleaned up, which is giving you sufficient room to perform the subsequent calculations. I wonder if you could do %reset -f out in between calculations (instead of gc.collect()), or print the results of chunking, or plot it directly?

Topic		Replies	Views
Finding a way to iterate using the input of two xarray dataarrays when chunked Technical help , outofscope	9	123	20 January 2025
How to efficiently chunk data for faster processing and plotting? Technical python , cosima , access-om2	5	145	29 September 2024
Intake vs mfdataset Technical python , help , inscope , intake	19	113	27 May 2025
Rechunking large data at constant memory in Dask Technical python , dask	1	206	21 March 2023
Xarray warnings while loading data using cosima cookbook Technical python , help , cosima	16	431	2 December 2024

Python garbage collection seems to be failing me?

Related topics