I had some code working in the hh5 env which is now crashing frequently in xp65.
I am printing to file. The code that was working on hh5:
file_name_vpd = pathway + '_' + GCM + '_' + RCM + '_gwl' + chosen_gwl + '_vpd.nc'
output_file_location = output_dir + GCM + '/' + pathway + '/' + ensemble + '/' + RCM + '/v1-r1/day/' + file_name_vpd
gwl_vpd.to_netcdf(output_file_location, engine='netcdf4')
print(output_file_location)
It works a bit better if I break the files in half and specify threads_per_worker=1, but this creates downstream issues of course and is slower. The code which is “working” in xp65 (basically the same as above, but with two cases based on first or second half of the ds):
#import all the stuff
from netCDF4 import Dataset
import xarray as xr
import numpy as np
import pandas as pd
from datetime import timedelta
import matplotlib.pyplot as plt
import glob
import sys
sys.path.append("/g/data/mn51/users/nb6195/project/gwls/")
import gwl
I’ve been unable to test anything on Gadi as ARE is down & it seems my job submissions aren’t working either, but I’ve had a read through & I don’t think that gwl_vpd_print is defined anywhere in the code snippets you provided.
Would you be able to provide (probably when Gadi comes back up assuming you’ll be affected by the same issues as me) a script that will allow me to try to write the same xarray dataset to a file so I can reproduce the issue.
With that said, I suspect this might be another manifestation of this pesky netcdf thread safety issue. I wonder whether specifying engine='h5netcdf' might go some way to fixing that.
Hi Charles,
thanks for your message. Yes, same ARE/Gadi issue. I have my full notebook in GitHub, I can share that with you. When ARE is back online, I will push the most recent code and send you a link.
Quick clarification point:
Sometimes to_netcdf fails because it triggers a load on data, not because writing that data to file is the problem. Do you get the same error if you try gwl_vpt_print.load(), or does it only occur with gwl_vpt_print.to_netcdf(output_file_location)?
(Having said that, the fact that you’re getting a HDF5 error specifically reduces the chance of the load being the problem)
Hi Naomi, please could you create a minimum working example with just the lines of code needed to reproduce the failure? I’m trying to run the VPD Functions.ipynb notebook in the provided github repo, which looks to be the right one, but in. eg this cell:
I’m not finding any files in the infiles1a or infiles1b globs. If I comment out that block and uncomment the above block, I get some files that I can open - but I assume these will be the wrong files to reproduce the error.
I’m still waiting mn51 membership in order to run the functions from the gwl package.
EDIT: Sorry, ignore that, didn’t have the correct projects added.
Assuming I’m trying to reproduce the failure correctly, I’m able to run the following code fine now, using conda/analysis3-25.05 on a large ARE instance:
import dask
from dask.distributed import Client, wait
from dask import delayed
client = Client(n_workers=7, threads_per_worker=1)
#import all the stuff
from netCDF4 import Dataset
import xarray as xr
import numpy as np
import pandas as pd
from datetime import timedelta
import matplotlib.pyplot as plt
import glob
import sys
sys.path.append("/g/data/mn51/users/nb6195/project/gwls/")
import gwl
#function to compute VPD from tasmax and rh datasets
#input: are datasets of rh and tasmax
#output: datasets of vpd and monthly_mean_vpd
def vpd_calc(ds_rh, ds_tasmax):
# vpd = (1 - ds_rh/100) * 0.61094 * np.exp((17.652 * ds_tasmax)/(243.04 + ds_tasmax)) #originally used funtion
vpd = (1 - ds_rh/100) * 6.1094 * np.exp ((17.625 * ds_tasmax)/(243.04 + ds_tasmax)) #the formula that Blair uses from http://www.bom.gov.au/research/publications/cawcrreports/CTR_024.pdf
monthly_mean_vpd = vpd.groupby('time.month').mean('time', keep_attrs=True)
vpd.attrs = {
'long_name': 'Daily maximum vapour dressure deficit computed from tasmax and hursmin',
'standard_name': 'vpd',
'units': 'hPa',
# 'regrid_method': 'bilinear'
}
ds_vpd = xr.Dataset({'vpd' : vpd})
ds_rh.close()
ds_tasmax.close()
monthly_mean_vpd.attrs = {
'long_name': 'Monthly mean vapour dressure deficit computed from tasmax and hursmin',
'standard_name': 'monthly_mean_vpd',
'units': 'hPa',
}
ds_monthly_mean_vpd = xr.Dataset({'monthly_mean_vpd' : monthly_mean_vpd})
return ds_vpd, ds_monthly_mean_vpd
#Set parameters
CMIP='CMIP6'
AGENCY = 'BOM'
RCM = 'BARPA-R'
GCM = 'MPI-ESM1-2-HR'
ensemble = 'r1i1p1f1' #BOM done, no CSIRO
pathway = 'ssp370'
output_dir = '/scratch/tm70/ct1163/naomi/VPD/vpd/'
output_dir_mm = '/scratch/tm70/ct1163/naomi/VPD/vpd/monthly_mean/'
#read in RCM files
var1 = 'tasmax'
ddir = f"/g/data/kj66/CORDEX/output/{CMIP}/DD/AUST-05i/{AGENCY}/{GCM}"
infiles1a=glob.glob(ddir+f'/historical/{ensemble}/{RCM}/v1-r1/day/{var1}/v20241216//{var1}_AUST-05i_{GCM}_historical_{ensemble}_{AGENCY}_{RCM}_v1-r1_day_*.nc')
infiles1b=glob.glob(ddir+f'/{pathway}/{ensemble}/{RCM}/v1-r1/day/{var1}/v20241216/{var1}_AUST-05i_{GCM}_{pathway}_{ensemble}_{AGENCY}_{RCM}_v1-r1_day_*.nc')
tasmax_master_ds = xr.open_mfdataset(infiles1a + infiles1b)
var2 = 'hursmin'
ddir2 = f"/g/data/kj66/CORDEX/output/{CMIP}/DD/AUST-05i/{AGENCY}/{GCM}"
infiles2a=glob.glob(ddir+f'/historical/{ensemble}/{RCM}/v1-r1/day/{var2}/v20241216/{var2}_AUST-05i_{GCM}_historical_{ensemble}_{AGENCY}_{RCM}_v1-r1_day_*.nc')
infiles2b=glob.glob(ddir+f'/{pathway}/{ensemble}/{RCM}/v1-r1/day/{var2}/v20241216/{var2}_AUST-05i_{GCM}_{pathway}_{ensemble}_{AGENCY}_{RCM}_v1-r1_day_*.nc')
hursmin_master_ds = xr.open_mfdataset(infiles2a + infiles2b)
#Extract time period corresponding to the chosen GWL for tasmax and rh
chosen_gwl = '1.5'
gwl_tasmax = gwl.get_GWL_timeslice(tasmax_master_ds,CMIP,GCM,ensemble,pathway,GWL=chosen_gwl)[var1]
gwl_rh = gwl.get_GWL_timeslice(hursmin_master_ds,CMIP,GCM,ensemble,pathway,GWL=chosen_gwl)[var2]
gwl_vpd, monthly_mean_vpd = vpd_calc(gwl_rh, gwl_tasmax)
file_name_mean = 'gwl' + chosen_gwl + '_monthly_mean_vpd_' + GCM + '_' + RCM + '_' + pathway + '_' + ensemble + '.nc'
output_file_location = output_dir_mm + file_name_mean
monthly_mean_vpd.to_netcdf(output_file_location, engine='netcdf4')
print(output_file_location)
I’m loading the module conda/analysis3. I have been running the “print half” version today with very few crashes, quite surprisingly - usually more frequent, only one crash today from about 15 launches to create data in a piecewise fashion.