Vertically integrated OHC with ACCESS-OM2-01 -- how to handle a lot of 3D data

JuliaN · 1 June 2023 23:46

Hi! First of all apologies if something like this was posted already. I did a bit of a search on Slack and here and couldn’t find anything directly related that could be a solution.

I am calculating global OHC using output from the 0.1 degree model. I need to do this for 4 experiments, for 40 years and it uses temperature and dzt (monthly averages, 3D grid). You can see my code here OHC.ipynb · GitHub.

So far, it takes me ~25min to get OHC for one year using 28 nodes. I don’t usually use the cookbook with large datasets, but in this case data is spread across different groups and to trace it all would be a bit incovenient. Is there a way to optimise this with the cookbook, or chunking, etc?

Thank you very much in advance!

rmholmes · 2 June 2023 00:06

There’s a scalar variable called total_ocean_heat in the ocean_scalar.nc files. Pretty sure that’s what you want?

rmh561@gadi-login-08 seasonal_sea_level]$ ncdump -h /g/data/ik11/outputs/access-om2-01/01deg_jra55v13_ryf9091/output932/ocean/ocean_scalar.nc 
netcdf ocean_scalar {
...
        double total_ocean_heat(time, scalar_axis) ;
                total_ocean_heat:long_name = "Total heat in the liquid ocean referenced to 0degC" ;
                total_ocean_heat:units = "Joule/1e25" ;
                total_ocean_heat:valid_range = 0., 1.e+20 ;
                total_ocean_heat:missing_value = -1.e+20 ;
                total_ocean_heat:_FillValue = -1.e+20 ;
                total_ocean_heat:cell_methods = "time: mean" ;
                total_ocean_heat:time_avg_info = "average_T1,average_T2,average_DT" ;

JuliaN · 2 June 2023 00:07

I want spatial maps of OHC

rmholmes · 2 June 2023 00:11

Oh ok sorry. So you want vertically-integrated OHC.

Give you’re doing this over a large number of files. One approach would be to use nco tools and a bunch of different PBS jobs? Edit: I see your comment on the files being in funny places. Sorry I’m not sure off the top of my head.

JuliaN · 2 June 2023 00:16

Yeah, I guess for now I just have to resign to this taking a couple of days

JuliaN · 2 June 2023 00:51

My end results would be to plot OHC averaged over different decades… If I calculated mean T and dzt and then calculated mean OHC for that decade, would it be the same as calculating it from monthly data and then averaging OHC?

In other words, can I do the time average before the vertical integration? Or will I be introducing an error by time averaging cell height before the vertical integral?

rmholmes · 2 June 2023 01:12

You will introduce an error that way because you won’t capture temporal correlations between dzt and temp. Whether these errors are significant depends on what you’re trying to do with the results. For most applications (i.e. not trying to close budgets) I’d think it is probably fine and it sounds like a good approach (note that you’re already missing sub-monthly correlations between these anyway). But I’d suggest testing it on a subset of the data first.

adele-morrison · 2 June 2023 01:14

No, it’s not the same. But the temporal variations in dzt are small, so I expect the error would be small. You could do a test over a couple of years of time averaging before and after to check the size of the error.

navidcy · 2 June 2023 03:03

Well, @JuliaN from your notebook seems like want to compute the vertically integrated OHC for every year. But your notebook just does a for loop to loop over the years… That’s not at all efficient because each year has to wait for the previous year to finish before it starts. You can parallelize this using xarray somehow but I need to play around to figure how to do this.

navidcy · 2 June 2023 03:09

This notebook might be useful… but you might need to “filter out” quite a bit.

https://github.com/navidcy/IntrinsicOceanicLFVariabilityUOHC/blob/master/prepare-raw-data.ipynb

First of all in that notebook I’m dealing with both ssh and 3D temp data; ignore the ssh
Second I’m also regridding to a 1-degree lat-lon; ignore the regridding
But lastly I do compute the UOHC as integral over some depth and save monthly output – that might be of interest.

I think this:

output_folder = "output/temp-010deg-RYF/"

years, das = zip(*temp_010_RYF.groupby("time.year"))
ds = [d.to_dataset(name="temp") for d in das]
paths = [output_folder + "temp-{}.nc".format(y) for y in years]

xr.save_mfdataset(ds, paths)

will write netCDF files in parallel (not wait each year to finish before starting working on next year).

Note, btw, that what’s needed here, most possibly, is mostly xarray+dask manipulations and choosing appropriate chunks and saving intermediate output files. It doesn’t have to do with the cookbook (as the msg at the top of the thread might be implying).

navidcy · 2 June 2023 03:12

(Suggestion: perhaps rename the thread to something a bit more descriptive of the situation?)

Aidan · 2 June 2023 04:30

Why write individual files to years anyway? Just resample to yearly and do the sum directly on the resampled data.

JuliaN · 2 June 2023 04:38

Thanks everyone! I don’t want to resample to yearly, I want the original monthly averages. I’m just doing it year by year because it won’t take the entire decade, fair enough.

Thanks Navid! I’ll look into that notebook

navidcy · 2 June 2023 04:43

Well I am not arguing that what I did was in anyway the best solution! It was just a solution that worked in a reasonable amount of time!

JuliaN · 2 June 2023 04:44

It is the best so far!

navidcy · 2 June 2023 04:46

Well, I did bang my head on the (dask) wall many times until I converged to the what I did.
I’m glad you find that helpful. @angus-g was instrumental in getting me out of my troubles back then…

rmholmes · 2 June 2023 04:55

This page has some useful stuff that you might be able to use for your embarassingly parallel case: Embarrassingly parallel Workloads — Dask Examples documentation

Topic		Replies	Views
Notebook that takes too long Technical python , cosima	12	478	22 January 2023
Improving performance of a Python-based function General python , help , outofscope , vectorisation	13	363	29 October 2023
How to efficiently chunk data for faster processing and plotting? Technical python , cosima , access-om2	5	173	29 September 2024
A reference for making dask work (faster) Technical dask , knowledge-base	11	1304	22 July 2025
2nd May 2025 - Unlocking the Power of Xarray and Dask 2025 training program	7	177	15 May 2025

Vertically integrated OHC with ACCESS-OM2-01 -- how to handle a lot of 3D data

Related topics