How to configure ERA5 monthly dataset block for ESMValTool on Gadi

Description of request:

I’m trying to run ERA5 through ESMValTool in a CMIP6-like way on Gadi. I started from the daily ERA5 cmorizer recipe and tried to adapt it to use monthly ERA5 instead, but the recipe failed (other recipes run fine, so it’s not a general setup issue). Could you please explain why the monthly version is not working, and clarify what each of these fields (family, typeid, level, nci_type, tres) should be for ERA5 monthly data on NCI (/g/data/rt52/era5)? This would help me configure my recipes correctly for comparing ERA5 monthly means with CMIP6 (Amon). Cheers.

Environment:

NCI Gadi login node
Conda env: analysis3-24.04 (esmvaltool-workflow)

What executed:

following cmorizers/recipe_daily_era5.yml, in terminal,I run esmvaltool run cmorizers/recipe_daily_era5.yml
datasets:

  • {dataset: ERA5, project: native6, type: reanaly, version: v1, tier: 3,
    tres: 1H, start_year: 1990, end_year: 1991}

Actual results:

2025-09-05 04:23:50,146 UTC [3379656] ERROR Could not create all tasks
2025-09-05 04:23:50,146 UTC [3379656] ERROR Dataset key ‘nci-type’ must be specified for {‘mip’: ‘E1hr’, ‘era5_name’: ‘total_cloud_cover’, ‘era5_freq’: ‘hourly’, ‘preprocessor’: ‘daily_mean’, ‘dataset’: ‘ERA5’, ‘project’: ‘native6’, ‘type’: ‘reanaly’, ‘version’: ‘v1’, ‘tier’: 3, ‘short_name’: ‘clt’, ‘timerange’: ‘1990/1991’, ‘variable_group’: ‘clt’, ‘diagnostic’: ‘daily’, ‘recipe_dataset_index’: 0, ‘alias’: ‘ERA5’, ‘automatic_regrid’: True, ‘family’: ‘E5’, ‘typeid’: ‘00’, ‘level’: ‘sf’, ‘grib_id’: ‘164’, ‘tres’: ‘1H’, ‘original_short_name’: ‘clt’, ‘standard_name’: ‘cloud_area_fraction’, ‘long_name’: ‘Total Cloud Cover Percentage’, ‘units’: ‘%’, ‘modeling_realm’: [‘atmos’], ‘frequency’: ‘1hrPt’}, check your recipe entry
2025-09-05 04:23:50,146 UTC [3379656] ERROR Dataset key ‘level’ must be specified for {‘mip’: ‘E1hr’, ‘era5_name’: ‘evaporation’, ‘era5_freq’: ‘hourly’, ‘preprocessor’: ‘daily_mean’, ‘dataset’: ‘ERA5’, ‘project’: ‘native6’, ‘type’: ‘reanaly’, ‘version’: ‘v1’, ‘tier’: 3, ‘short_name’: ‘evspsbl’, ‘timerange’: ‘1990/1991’, ‘variable_group’: ‘evspsbl’, ‘diagnostic’: ‘daily’, ‘recipe_dataset_index’: 0, ‘alias’: ‘ERA5’, ‘automatic_regrid’: True, ‘family’: ‘E5’, ‘typeid’: ‘00’, ‘tres’: ‘1H’, ‘original_short_name’: ‘evspsbl’, ‘standard_name’: ‘water_evapotranspiration_flux’, ‘long_name’: ‘Evaporation Including Sublimation and Transpiration’, ‘units’: ‘kg m-2 s-1’, ‘modeling_realm’: [‘atmos’], ‘frequency’: ‘1hrPt’}, check your recipe entry
2025-09-05 04:23:50,146 UTC [3379656] ERROR Dataset key ‘level’ must be specified for {‘mip’: ‘E1hr’, ‘era5_name’: ‘potential_evaporation’, ‘era5_freq’: ‘hourly’, ‘preprocessor’: ‘daily_mean’, ‘dataset’: ‘ERA5’, ‘project’: ‘native6’, ‘type’: ‘reanaly’, ‘version’: ‘v1’, ‘tier’: 3, ‘short_name’: ‘evspsblpot’, ‘timerange’: ‘1990/1991’, ‘variable_group’: ‘evspsblpot’, ‘diagnostic’: ‘daily’, ‘recipe_dataset_index’: 0, ‘alias’: ‘ERA5’, ‘automatic_regrid’: True, ‘family’: ‘E5’, ‘typeid’: ‘00’, ‘tres’: ‘1H’, ‘original_short_name’: ‘evspsblpot’, ‘standard_name’: ‘water_potential_evaporation_flux’, ‘long_name’: ‘Potential Evapotranspiration’, ‘units’: ‘kg m-2 s-1’, ‘modeling_realm’: [‘land’], ‘frequency’: ‘1hrPt’}, check your recipe entry

Hello Xinhui,

Thanks for posting! It might take a bit to dig into this as I haven’t tried using the ERA5 monthly data for a while. But there’s a few things to look at to hopefully help you sooner.

  • Do you know where your config files are for ESMValTool are and if they have been edited?

Check config-user, did you runesmvaltool config get_config_user? and config-developer if you are not using the default?

  • The default on NCI for the project native6 in config-developer file is
native6:
  cmor_strict: false
  input_dir:
    #default: 'Tier{tier}/{dataset}/{version}/{frequency}/{short_name}'
    default: '{level}/{nci-type}/{era5-shortname}/*'
  input_file:
    default: '*.nc'
  output_file: '{project}_{dataset}_{type}_{version}_{mip}_{short_name}'
  cmor_type: 'CMIP6'
  cmor_default_table_prefix: 'CMIP6_'

This is how it is finding the files so ‘level’ and ‘nci-type’ would be folders under /g/data/rt52/era5

  • Also you may need an ‘extra facets’ file if its not already pointing to this file: /g/data/xp65/public/apps/esmvaltool/config/extra_facets/native6-era5.yml

I have something like this setting in my config-user.yml file for any extra_facets files I have.

extra_facets_dir: /home/189/fc6164/.esmvaltool/extra_facets

Then when esmvaltool searches for the data it can fill in nci-type and level from the mappings in the native6-era5.yml. See some more info in the docs.

So I may need to check and fix our configuration files that are default for all users.
Let me know if that makes sense.

Thanks for your help and reply! I tried a long time but still havn’t figure it out.
First I only changed the output_dir in my own config_user.yml before, so now I added the extra_facets_dir and my config_user.yml became:

# Site-specific entries: NCI ACCESS-NRI
# Uncomment the lines below to locate data on NCI ACCESS-NRI.
rootpath:
  CMIP6: [
  /g/data/oi10/replicas/CMIP6, 
  /g/data/fs38/publications/CMIP6, /g/data/xp65/public/apps/esmvaltool/replicas/CMIP6,
  /g/data/zv30/cmip/CMIP6
  ]
  CMIP5: [/g/data/r87/DRSv3/CMIP5, /g/data/al33/replicas/CMIP5/combined, /g/data/rr3/publications/CMIP5/output1, /g/data/xp65/public/apps/esmvaltool/replicas/cmip5/output1]
  CMIP3: /g/data/r87/DRSv3/CMIP3
  OBS: [/g/data/ct11/access-nri/replicas/esmvaltool/obsdata-v2]
  OBS6: [/g/data/ct11/access-nri/replicas/esmvaltool/obsdata-v2, /g/data/ct11/access-nri/era5-derived, /g/data/nf33/public/data/ESMValTool/obsdata]
  obs4MIPs: [/g/data/ct11/access-nri/replicas/esmvaltool/obsdata-v2]
  ana4mips: [/g/data/ct11/access-nri/replicas/esmvaltool/obsdata-v2]
  native6: [/g/data/rt52/era5]
  ACCESS: /g/data/p73/archive/non-CMIP
  ZV30: /g/data/zv30/cmip/CMIP6
  CORDEX-CMIP6: 
    - /g/data/hq89/CCAM/output/
    - /g/data/py18/BARPA/output/
    - /g/data/zz63/NARCliM2-0/output/
    - /g/data/ig45/QldFCP-2/output/
  BARRA2: 
    - /g/data/ob53/BARRA2/output

drs:
  CMIP6: NCI
  CMIP5: NCI
  CMIP3: NCI
  CORDEX: ESGF
  obs4MIPs: default
  ana4mips: default
  ACCESS: default
  native6: default
  ZV30: NCI
  CORDEX-CMIP6: NCI
  BARRA2: NCI

extra_facets_dir: "/g/data/xp65/public/apps/esmvaltool/config/extra_facets"

I ran this yml again:


datasets:
  - {dataset: ACCESS-CM2, project: CMIP6, exp: historical, ensemble: r1i1p1f1, grid: gn}
  - {dataset: ERA5,project: native6, level: single-levels, nci-type: monthly-averaged } # wrong 
  


preprocessors:

  preprocessor_1:
    custom_order: true
    regrid: {target_grid: 2x2, scheme: linear}
    

  preprocessor_2:
    # extract_time: {start_year: ${START_Y}, end_year: ${END_Y}}
    custom_order: true
    regrid:
      target_grid: 2x2
      scheme: linear
    extract_levels:
      levels: {cmor_table: CMIP6, coordinate: plev19}
      coordinate: air_pressure
      scheme: linear
      


diagnostics:
  diagnostic_1:
    description: Download 4type data from ACCESS
    variables:
      tas: {mip: Amon, preprocessor: preprocessor_1, start_year: 1980, end_year: 1980,}        
          # 2-D
      psl: {mip: Amon, preprocessor: preprocessor_1,  start_year: 1980, end_year: 2011} 
    scripts: null
    

Seems I found ERA5 monthly dataset successfully

PreprocessingTask: diagnostic_1/psl
order: ['regrid', 'remove_supplementary_variables', 'save']
PreprocessorFile: /g/data/ng72/xw6141/PhD/year2/ch2_projection/data/processed/esmvaltool/recipe_preprocessor_access_data_regrid_era5_20250908_120424/preproc/diagnostic_1/psl/native6_ERA5_an_v1_Amon_psl_1980-2011.nc
input files: [LocalFile('/g/data/rt52/era5/single-levels/monthly-averaged/msl/1980/msl_era5_moda_sfc_19800101-19800131.nc'),
 LocalFile('/g/data/rt52/era5/single-levels/monthly-averaged/msl/1980/msl_era5_moda_sfc_19800201-19800229.nc'),
 LocalFile('/g/data/rt52/era5/single-levels/monthly-averaged/msl/1980/msl_era5_moda_sfc_19800301-19800331.nc'),

However, it shows error for finding frequency of ERA5 monthly dataset:

  File "/g/data/xp65/public/apps/med_conda/envs/analysis3-25.08/lib/python3.11/site-packages/esmvalcore/cmor/_fixes/native6/era5.py", line 533, in _fix_coordinates
    self._fix_monthly_time_coord(cube)
  File "/g/data/xp65/public/apps/med_conda/envs/analysis3-25.08/lib/python3.11/site-packages/esmvalcore/cmor/_fixes/native6/era5.py", line 550, in _fix_monthly_time_coord
    if get_frequency(cube) == "monthly":
       ^^^^^^^^^^^^^^^^^^^
  File "/g/data/xp65/public/apps/med_conda/envs/analysis3-25.08/lib/python3.11/site-packages/esmvalcore/cmor/_fixes/native6/era5.py", line 36, in get_frequency
    raise ValueError(
ValueError: Unable to infer frequency of cube with length 1 time dimension: air_temperature / (K)               (time: 1; latitude: 721; longitude: 1440)
    Dimension coordinates:
        time                             x            -               -
        latitude                         -            x               -
        longitude                        -            -               x
    Scalar coordinates:
        height                      2.0 m
    Attributes:
        Conventions                 'CF-1.6'
        license                     'Licence to use Copernicus Products: https://apps.ecmwf.int/datasets/li ...'
        source_file                 '/g/data/rt52/era5/single-levels/monthly-averaged/2t/1980/2t_era5_moda_ ...'
        summary                     'ERA5 is the fifth generation ECMWF atmospheric reanalysis of the global ...'
        title                       'ERA5 single-levels monthly-averaged 2m_temperature 19800101-19800131'

I don’t know if this means finding ERA5 worked but processing ERA5 dataset has some issue that need to be solved:…

I’m sorry you have spent so long on this!
I think I have seen something like this before and it looks like that the code in ESMValCore expects the native6-era5 data to have more than 1 timestep per file but the monthly data in rt52 are in one netcdf file per month.
I think if we can’t change the data structure in rt52, concatenating files somewhere, we would have to add a patch or change to the esmvalcore package.
I can look into it a bit more and get back to you.

2 Likes

Thanks so much! I guess this is a main problem now for using the monthly ERA 5 dataset. Looking forward to hearing from you:)

We are looking to fix this limitation in esmvalcore and working on it in a PR(for reference). This will then have to be released with esmvalcore then updated in the xp65 environment. Do you have a time frame you’re working with? If you need something specific