"U" velocity incorrect when using intake in IAF runs

taimoorsohail · 4 December 2025 04:33

Hi,

Me and @Wilton_Aguiar stumbled on an error in how intake loads the uvariable in the IAF run. If we print the files it is pulling ufrom:

df = cat[expt].search(variable='u', frequency='1mon').df
print(df["path"])

we get:

0        /g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle4/output732/ocean/ocean-3d-u-1-monthly-mean-ym_1958_01.nc
1       /g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle4/output732/ocean/ocean-3d-u-1-monthly-pow02-ym_1958_01.nc
2        /g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle4/output733/ocean/ocean-3d-u-1-monthly-mean-ym_1958_02.nc
3       /g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle4/output733/ocean/ocean-3d-u-1-monthly-pow02-ym_1958_02.nc
4        /g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle4/output734/ocean/ocean-3d-u-1-monthly-mean-ym_1958_03.nc
5       /g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle4/output734/ocean/ocean-3d-u-1-monthly-pow02-ym_1958_03.nc
.
.
.

As you can see, it is pulling from both the mean and pow02 files, so the resulting u is actually 1/4 what it should be. There is no error thrown for this.

The workaround I found is to use xr.open_mfdataset instead:

# Keep only monthly-mean files
df_mean = df[df["filename"].str.contains("monthly-mean")]

u_mean = xr.open_mfdataset(
    df_mean["path"].tolist(),
    combine="by_coords",
    decode_times=True,
    decode_timedelta=True
)["u"]

This then works. I think @JuliaN was having this issue too? It might be worth checking if this error exists in other runs (e.g. RYF) and also for other variables (e.g. v, w).

aekiss · 4 December 2025 04:44

It would be nice if Intake provided some warning that the reduction method is inconsistent between files, or (even better) halt with an error message that disambiguation is needed.

navidcy · 4 December 2025 04:50

This feels like a dejavu. I think I have hit this error before…

This is related: Silently loads incorrect data when variables only differ by cell_methods · Issue #252 · COSIMA/cosima-cookbook · GitHub (from the cookbook days…..)

Also this: Snap and mean fields saved with the same frequency are not currently disambiguated · Issue #212 · COSIMA/cosima-cookbook · GitHub

JuliaN · 4 December 2025 04:58

Yeah, I had the same issue, see Intake (bug?)

It hasn’t been solved yet to my knowledge.

taimoorsohail · 4 December 2025 05:00

Ah thanks, I didn’t notice this issue! You were on it OK, let’s see what happens.

Wilton_Aguiar · 4 December 2025 05:01

it’d be useful to have a large red warning about that in the Intake tutorial while this is being fixed - I imagine students might easily get by without noticing the problem ( we only noticed because we know how the velocities in the specific analysis we are doing should look like)

JuliaN · 4 December 2025 05:01

Good to flag it again though!

jemmajeffree · 4 December 2025 05:13

One of the inevitable trade-offs with automating data reading and removing some of the middle steps is that it reduces the amount that the user needs to know about the data and the number of points where a visual check might be made.

Given this type of problem recurs, do we need to think more deeply as a community about the types of additional safeguards that could be included in automated data-reading like intake?

taimoorsohail · 4 December 2025 05:25

My tendency is to go verbose. For instance, we could print the unique file paths that intake is pulling from when calling a variable, e.g.

Now loading:

/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle4/output732/ocean/ocean-3d-u-1-monthly-mean-ym*

/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle4/output732/ocean/ocean-3d-u-1-monthly-pow02-ym*

But admittedly that might be too narrow a solution for a bigger problem?

CharlesTurner · 4 December 2025 23:48

Sorry, yeah, this is a bit of a nasty problem that we’re working on. The solution was actually done a while ago but got blocked behind some complicated build related upgrades and a bunch of other urgent stuff. Since this has reared it’s ugly head again, I’m going to revert all of those changes so we can push the fix through & then we’ll reapply the changes as we go.

As a bit of a stopgap (and this was meant to be held back a bit longer until we were happier with how it works, so please don’t ignore the big yellow warning banner), I’ve also been working on a better way to deal with accessing & sharing data through the intake catalog, which you can find at interactive-catalog-spa (currently - it’ll move somewhere more official soon). Most of the complicated build issues this fix got stuck within were related to changing infrastructure to make this viewer possible - I’m hoping it will be a step change in intake usability.

In this online viewer tool, the filters you can apply will reflect what remains in the filtered dataset, so it should be more straightforward to check/verify this issue. I’ll add a warning if .to_dask() returns >1 variable_cell_methods options quickly this morning - should be a ~30 minute job.

Touch wood, fixing the variable cell methods stuff in catalog should be done by early next week - the fix is mostly done, there’s just a lot of related groundworks to cover.

EDIT: The online catalog explorer I linked should now tell you whether you are going to run into variable_cell_methods issues.

CharlesTurner · 18 December 2025 04:05

Sorry, I updated in the other thread that Julia started, but this should now be fixed.

Please let me know if you get any unexpected behaviours. I’m just going through and updating the docs today.

TLDR; Catalogs now have a temporal_sample field that should disambiguate these time aggregations. This is a quite far reaching fix, so there is some potential for weird edge cases that I haven’t found. However, it should now be safe by default!

The online catalog viewer I linked above won’t mirror these changes quite yet, but should soon - realistically probably January. It’s still a work in progress & I haven’t set it up to auto mirror yet, so just be aware that it’s now temporarily out of date. Once we’re happy with it, we’ll do a proper release to let everyone know.

Topic		Replies	Views
Intake (bug?) Technical intake	17	196	17 December 2025
Unable to find variable in intake Technical python , help , intake	4	39	4 December 2025
Issues loading ACCESS-OM2-01 daily data from RYF run Technical	6	326	20 May 2023
Error when loading an intake variable in new conda env 25.09 Technical help , intake	5	70	23 November 2025
Intake vs mfdataset Technical python , help , intake	19	175	27 May 2025

"U" velocity incorrect when using intake in IAF runs

Related topics