Parcels particle tracking and inode quota on Gadi

edoddridge · 23 April 2026 04:57

Hi everyone,

We (jk72) have been encountering inode quota issues on Gadi lately.

Our users with the biggest footprints have all been using the parcels particle tracking library, which now outputs .zarr files by default. This means that even relatively modest particle tracking experiments can rack up more than a million files very quickly.

I know it’s possible to combine them into either NetDCF or tarballs after the fact, but is there a better approach?

Efforts to deal with this are also being hampered by the fact that the most recent scan for nci-files-report is nearly three weeks old for /scratch/jk72/. I asked the Helpdesk about this and was told that a rework of the scanning protocol is in progress, but that there is no ETA for an updated scan. The NCI Project(s) Management Tips and Tricks post is really helpful, but without an update to the output for nci-files-report I feel pretty hamstrung.

Thanks,
Ed

dougiesquire · 23 April 2026 05:26

My zarr knowledge is a little out of date, but you can zip zarr collections and still access them directly with Python - see storage - zarr-python

anton · 23 April 2026 05:34

You can run your own scans for usage / inode count using ncdu kindly provided by the release team:

module use /g/data/vk83/modules
module load system-tools/ncdu/1.19-1
ncdu /scratch/tm70/$USER/

Use the help (?) interface to sort by inodes

It takes a while for large folders

Aidan · 23 April 2026 07:14

Definitely only one person do this (so coordinate efforts). Otherwise you thrash the filesystem. @clairecarouge has been in touch NCI about the scanning issue, so definitely good to loop her into discussions about problems you are having.

The ncdu man page is here:

The file count options are explained in the KEYS section.

You can output the scan to a file and then examine later. That may be a good option if you want to share the output with others, or run the job itself under a PBS job (as it can take quite a while).

adele-morrison · 23 April 2026 09:40

Just to note that we have an open issue on the COSIMA recipes about this also with more discussion from last time this issue arose:

github.com/COSIMA/cosima-recipes

Investigate issues associated with .zarr format of new Parcels releases

opened 01:48AM - 14 Jun 24 UTC

hrsdawson

🕹️ hackathon 4.0

Newer versions of Parcels output trajectory data in .zarr format, rather than .n…etcdf. In some (all?) cases, this may lead to the creation of many, many files clogging NCI projects on Gadi. To do: - Check the output format of new Parcels releases - If no flexibility in output (don't think there is); - Add warning to the [Particle tracking recipe](https://github.com/COSIMA/cosima-recipes/blob/main/DocumentedExamples/Particle_tracking_with_Parcels.ipynb) about the generation of many files - Add example to same recipe of how to consolidate Parcels output into fewer files (e.g. using netcdf) - Add some guidelines on workflow, e.g. 1) running simulations on scratch, 2) consolidating output, 3) moving consolidated file(s) to gdata, 4) deleting raw .zarr output.

clairecarouge · 24 April 2026 00:59

For triage reasons, I’ve added the community-help tag here. We know a lot about managing storage on Gadi but do not officially support requests on this question outside the ACCESS-NRI scheme.

You already have gotten a lot of information here. I don’t have more to add at this point.

edoddridge · 26 April 2026 23:08

Thank you everyone. We’ve sorted this out now - summary below.

@dougiesquire - thanks. That’s helpful for longer term storage, but as far as I know, you need to zip the output after the run has finished, which means temporarily holding a huge number of files.

@anton - this was really helpful, but there were a number of folders that I couldn’t scan that turned out to be central to the issues we were having. More on this below.

@Aidan, @adele-morrison, and @clairecarouge - thanks for the help.

Summary of the diagnosed issue and solution
Turns out that the majority of our problem was caused by the confluence of two factors: parcels outputting huge numbers of files; and the way that the ARE GUI “deletes” files.

When a user deletes a file using the ARE GUI, that file isn’t actually deleted. Instead, it is moved in to a hidden .TRASH* folder in the project directory (/scratch/jk72/ in this case). Those folders have quite tight permissions on them so even as project administrator I was unable to look inside or scan them. This meant that even with ncdu, I couldn’t easily identify the issue.

Because of the huge number of files from parcels, we ended up with millions of zombie files inside the hidden .TRASH* folders, consuming a large proportion of our inode quota.

willrhobbs · 27 April 2026 01:34

Thanks Ed. Presumably the .TRASH issue doesn’t affect file deleted using a command line terminal (rm ), and that’s the preferable protocol?

adele-morrison · 27 April 2026 04:51

Should we incorporate this info and solution into the COSIMA recipes parcels notebook to avoid this happening again?

edoddridge · 27 April 2026 23:09

Yep; the affected users were able to use rm on the command line to solve the problem. The ARE terminal sessions worked just as well as a direct ssh connection.

I think so. I’ve put a PR on my to-do list. I’m also about to add a post about the hidden .TRASH* folders to the NCI Project(s) Management Tips and Tricks thread.

CharlesTurner · 28 April 2026 00:44

I might be a bit out over my skis here, but is there a way to NCI aware of this sort of issue as they go forward with the procurement process for the new machine?

So much of the development work in the xarray ecosystem is now centred around cloud optimised data formats (ie. zarr) that I’d be very surprised if this type of issue doesn’t become substantively worse going forwards.

Maybe an issue for a separate thread though…

Thomas-Moore · 28 April 2026 06:23

Speaking of getting WAY out over ones skis …

I’ve never even touched Parcels before but wonder if the “finished run” is held in memory at any point as an xarray DS you can access?

There does seem to be some Parcels documentation on outputting to a zarr ZipStore ( Output in zarr (advanced) — Parcels Documentation ) but I’m clearly not sure how or if that works as I’ve not even played with the package for 5 minutes.

Also, Captain Obvious here will note that rechunking the output could reduce the number of chunks / files / inodes if what defaults out of Parcels is really tiny per chunk?

Aidan · 28 April 2026 06:53

Yeah, that has been a problem before:

Thanks for adding it to the tips and tricks topic.

clairecarouge · 30 April 2026 05:08

@CharlesTurner yes there is, I’ll pass it on. Whether it will have an impact on the procurement or not, that’s another question.

edoddridge · 4 May 2026 23:55

There does seem to be some Parcels documentation on outputting to a zarr ZipStore ( Output in zarr (advanced) — Parcels Documentation )

Nice find in the documentation. Might be worth exploring adding that to the COSIMA recipes example. Thanks.

Topic		Replies	Views
Issue using parcels in analysis3 Ocean	3	87	5 November 2025
Zarr inodes Machine Learning for Climate and Weather machine-learning	2	83	9 November 2025
Help running OceanParcels using xp65 COSIMA help , community-help	4	56	1 December 2025
Zarr 2.14.0 includes experimental support for sharding Technical python , storage	12	534	27 August 2025
Parcels incompatible with Zarr >=3 Technical help , climate-conda-enviro	6	139	27 May 2026

Parcels particle tracking and inode quota on Gadi

Related topics