I am trying to identify which CMIP6 models at NCI have 3hr data, without manually trawling through them all.
Something like this:
import intake
cmip6 = intake.open_esm_datastore("/g/data/dk92/catalog/v2/esm/cmip6-oi10/catalog.json")
values_dict = cmip6.unique()
models_list = values_dict.source_id
but take a subset for testing
small_list = [‘EC-Earth3’, ‘CESM2’, ‘GFDL-ESM2M’, ‘GFDL-ESM4’, ‘MIROC6’, ‘NorESM2-MM’, ‘CMCC-ESM2’]
variables of interest
three_hr_data = [‘huss’, ‘tas’, ‘uas’, ‘vas’, ‘ps’, ‘pr’]
testing with one model
cat_subset = cmip6.search(
source_id=[‘EC-Earth3’],
experiment_id=[“ssp370”],
table_id=“3hr”,
variable_id=“huss”,
grid_label=[“gn”, ‘gr’],
)
cat_subset
cmip6-oi10 catalog with 5 dataset(s) from 430 asset(s):
etc
dset_dict = cat_subset.to_dataset_dict(
xarray_open_kwargs={“consolidated”: True, “decode_times”: True, “use_cftime”: True}
)
Which then fails with this error:
ESMDataSourceError: Failed to load dataset with key=‘f.ScenarioMIP.EC-Earth-Consortium.EC-Earth3.ssp370.r6i1p1f1.3hrPt.atmos.3hr.huss.gr.v20200201’
You can use cat['f.ScenarioMIP.EC-Earth-Consortium.EC-Earth3.ssp370.r6i1p1f1.3hrPt.atmos.3hr.huss.gr.v20200201'].df
to inspect the assets/files for this key.
Why does it say ‘3hrPt.atmos.3hr’ ? Not what I was expecting. Actual path is
/g/data/oi10/replicas/CMIP6/ScenarioMIP/EC-Earth-Consortium/EC-Earth3/ssp370/r1i1p1f1/3hr/huss/gr/v20200310
Can you suggest how to get this to work in a useful way? At the end of the day, I want a list of paths similar to the above, for all the models with 3hr data in the three_hr_data list. Many thanks.