Hi! First of all apologies if something like this was posted already. I did a bit of a search on Slack and here and couldn’t find anything directly related that could be a solution.
I am calculating global OHC using output from the 0.1 degree model. I need to do this for 4 experiments, for 40 years and it uses temperature and dzt (monthly averages, 3D grid). You can see my code here OHC.ipynb · GitHub.
So far, it takes me ~25min to get OHC for one year using 28 nodes. I don’t usually use the cookbook with large datasets, but in this case data is spread across different groups and to trace it all would be a bit incovenient. Is there a way to optimise this with the cookbook, or chunking, etc?
Oh ok sorry. So you want vertically-integrated OHC.
Give you’re doing this over a large number of files. One approach would be to use nco tools and a bunch of different PBS jobs? Edit: I see your comment on the files being in funny places. Sorry I’m not sure off the top of my head.
My end results would be to plot OHC averaged over different decades… If I calculated mean T and dzt and then calculated mean OHC for that decade, would it be the same as calculating it from monthly data and then averaging OHC?
In other words, can I do the time average before the vertical integration? Or will I be introducing an error by time averaging cell height before the vertical integral?
You will introduce an error that way because you won’t capture temporal correlations between dzt and temp. Whether these errors are significant depends on what you’re trying to do with the results. For most applications (i.e. not trying to close budgets) I’d think it is probably fine and it sounds like a good approach (note that you’re already missing sub-monthly correlations between these anyway). But I’d suggest testing it on a subset of the data first.
No, it’s not the same. But the temporal variations in dzt are small, so I expect the error would be small. You could do a test over a couple of years of time averaging before and after to check the size of the error.
Well, @JuliaN from your notebook seems like want to compute the vertically integrated OHC for every year. But your notebook just does a for loop to loop over the years… That’s not at all efficient because each year has to wait for the previous year to finish before it starts. You can parallelize this using xarray somehow but I need to play around to figure how to do this.
First of all in that notebook I’m dealing with both ssh and 3D temp data; ignore the ssh
Second I’m also regridding to a 1-degree lat-lon; ignore the regridding
But lastly I do compute the UOHC as integral over some depth and save monthly output – that might be of interest.
I think this:
output_folder = "output/temp-010deg-RYF/"
years, das = zip(*temp_010_RYF.groupby("time.year"))
ds = [d.to_dataset(name="temp") for d in das]
paths = [output_folder + "temp-{}.nc".format(y) for y in years]
xr.save_mfdataset(ds, paths)
will write netCDF files in parallel (not wait each year to finish before starting working on next year).
Note, btw, that what’s needed here, most possibly, is mostly xarray+dask manipulations and choosing appropriate chunks and saving intermediate output files. It doesn’t have to do with the cookbook (as the msg at the top of the thread might be implying).
Thanks everyone! I don’t want to resample to yearly, I want the original monthly averages. I’m just doing it year by year because it won’t take the entire decade, fair enough.
Well, I did bang my head on the (dask) wall many times until I converged to the what I did.
I’m glad you find that helpful. @angus-g was instrumental in getting me out of my troubles back then…