Cannot print to file - possible Dask issue in xp65

Hi help staff!

I had some code working in the hh5 env which is now crashing frequently in xp65.

I am printing to file. The code that was working on hh5:
file_name_vpd = pathway + '_' + GCM + '_' + RCM + '_gwl' + chosen_gwl + '_vpd.nc' 
output_file_location = output_dir + GCM + '/' + pathway + '/' + ensemble + '/' + RCM + '/v1-r1/day/' + file_name_vpd
gwl_vpd.to_netcdf(output_file_location, engine='netcdf4')
print(output_file_location)

It works a bit better if I break the files in half and specify threads_per_worker=1, but this creates downstream issues of course and is slower. The code which is “working” in xp65 (basically the same as above, but with two cases based on first or second half of the ds):

halves = ['1st', '2nd']

for half in halves:
    print(half)
    if half == '1st':
        gwl_vpd_print = gwl_vpd['vpd'][0:2737]
    else: 
        gwl_vpd_print = gwl_vpd['vpd'][2737:]
        
    file_name_vpd = half + '_' + pathway + '_' + GCM + '_' + RCM + '_gwl' + chosen_gwl + '_vpd.nc'
    output_file_location = output_dir + GCM + '/' + pathway + '/' + ensemble + '/' + RCM + '/v1-r1/day/' + file_name_vpd
    gwl_vpd_print.to_netcdf(output_file_location, engine='netcdf4')
        
    print('half done')
    print(output_file_location)

This will often still crash and require file deletion and relaunch.

Errors generally relate to file access. Pages long, the final statement being:

OSError: [Errno -101] NetCDF: HDF error: b'/g/data/ia39/ncra/bushfire/vpd/EC-Earth3/ssp370/r1i1p1f1/BARPA-R/v1-r1/day/ssp370_EC-Earth3_BARPA-R_gwl1.5_vpd.nc'

Packages I have imported:

#import all the stuff
from netCDF4 import Dataset
import xarray as xr
import numpy as np
import pandas as pd
from datetime import timedelta
import matplotlib.pyplot as plt
import glob
import sys
sys.path.append("/g/data/mn51/users/nb6195/project/gwls/")
import gwl

Would appreciate any help!

Thanks.

Hi Naomi,

I’ve been unable to test anything on Gadi as ARE is down & it seems my job submissions aren’t working either, but I’ve had a read through & I don’t think that gwl_vpd_print is defined anywhere in the code snippets you provided.

Would you be able to provide (probably when Gadi comes back up assuming you’ll be affected by the same issues as me) a script that will allow me to try to write the same xarray dataset to a file so I can reproduce the issue.

With that said, I suspect this might be another manifestation of this pesky netcdf thread safety issue. I wonder whether specifying engine='h5netcdf' might go some way to fixing that.

Hi Charles,
thanks for your message. Yes, same ARE/Gadi issue. I have my full notebook in GitHub, I can share that with you. When ARE is back online, I will push the most recent code and send you a link.

Thanks for your help!

Quick clarification point:
Sometimes to_netcdf fails because it triggers a load on data, not because writing that data to file is the problem. Do you get the same error if you try gwl_vpt_print.load(), or does it only occur with gwl_vpt_print.to_netcdf(output_file_location)?
(Having said that, the fact that you’re getting a HDF5 error specifically reduces the chance of the load being the problem)

Thanks for the suggestion. I haven’t tried gwl_vpt_print.load() but could give it a go once ARE is back up.