Building intake catalog: "Parser returns no valid assets" error

Hi there, I am trying to build an intake datastore using someone else’s existing ESM1.5 experiments. The hope is to be able to build on existing code with this. However, I get this error below, and it also fails with the same error when I do it through the terminal, as suggested by Charles here. Could I please check if I’m misunderstanding something or if something is missing? Thank you!!

%%time

builder = AccessEsm15Builder(
    path="/g/data/e14/afp599/access-esm/fs38_processed",
    ensemble=False # We could use this to pass multiple paths for different ensemble members "/g/data/e14/afp599/access-esm/post_processed_mw2",
).build()
--------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
File <timed exec>:4

File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.08/lib/python3.11/site-packages/access_nri_intake/source/builders.py:203, in BaseBuilder.build(self)
    198 def build(self):
    199     """
    200     Builds a datastore from a list of netCDF files or zarr stores.
    201     """
--> 203     self.get_assets().validate_parser().parse().clean_dataframe()
    205     return self

File /g/data/xp65/public/apps/med_conda/envs/analysis3-25.08/lib/python3.11/site-packages/access_nri_intake/source/builders.py:191, in BaseBuilder.validate_parser(self)
    188         validate_against_schema(info, ESM_JSONSCHEMA)
    189         return self
--> 191 raise ParserError(
    192     f"""Parser returns no valid assets.
    193     Try parsing a single file with Builder.parser(file)
    194     Last failed asset: {asset}
    195     Asset parser return: {info}"""
    196 )

ParserError: Parser returns no valid assets.
            Try parsing a single file with Builder.parser(file)
            Last failed asset: /g/data/e14/afp599/access-esm/fs38_processed/wfo_Omon_ACCESS-ESM1-5_ssp585_r9i1p1f1_2015-2100_r360x180.nc
            Asset parser return: {'INVALID_ASSET': '/g/data/e14/afp599/access-esm/fs38_processed/wfo_Omon_ACCESS-ESM1-5_ssp585_r9i1p1f1_2015-2100_r360x180.nc', 'TRACEBACK': 'Traceback (most recent call last):\n  File "/g/data/xp65/public/apps/med_conda/envs/analysis3-25.08/lib/python3.11/site-packages/access_nri_intake/source/builders.py", line 663, in parser\n    match_groups = re.match(r".*/([^/]*)/history/([^/]*)/.*\\.nc", file).groups()\n                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nAttributeError: \'NoneType\' object has no attribute \'groups\'\n'}

Hi Ellie, I’m today’s triager. I’ll try to find a suitable helper for you.

1 Like

Hi Ellie,

Typically this error results from files being named in ways that the builder doesn’t expect. I’ve requested to join e14 & I’ll update when I can check exactly what the issue is.

2 Likes

Okay, I’ve taken a look - these are all CMIP formatted data, which the ESM1.5 Builder doesn’t recognise the name patterns for - we fall back to using the filenames to work out things like frequencies.

I think we can pretty straightforwardly add a builder which handles CMIP formatted data - it’s a fairly minimal job now. I’m a bit tied up today - @joshuatorrance are you able to take a look at this?

also cc @dougiesquire, if you think adding a CmipBuilder is a terrible idea for any particular reason can you let us know?

I think adding this builder would be something of a stopgap measure - we’re aiming to completely break the filename dependency - but I don’t see any harm in having a builder tailored to this format of output in the meantime.

1 Like

My only thought is that you may want to make it a Cmip6Builder as I think the file structure is different across different CMIP eras.

1 Like

I think Charles has the gist of it.

The builder is expecting to see a path that looks something like this one:
/g/data/p73/archive/non-CMIP/ACCESS-ESM1-5/PI-GWL-B2035/history/atm/netCDF/*.nc
The builder’s regex is looking at the directories before and after history to determine the experiment_id and realm for each file.

Since this is CMIP data we can presumably pull the info we need straight out of each file’s metadata. Adding a Builder for CMIP is probably worthwhile and shouldn’t be difficult.

I’ve applied for e14 too, I’ll start on a CMIP/CMIP6 builder. EDIT: Charles beat me to it!

1 Like

The builder is done (Josh made some handy changes a few weeks back that made it very fast to implement).

We’ll try to get it released and into the conda/analysis3 environment ASAP.

1 Like

Hi Ellie,

The builder (Cmip6Builder) is now available in the conda/analysis3-25.10 environment.

Give it a crack and let us know how you go!

1 Like

Thank you Charles, I am able to make a datastore now!

I would like to clarify some best practice things though. Here the datastore (fs38_processed_datastore.search(variable="thetao", frequency = '1mon').df) has picked up many files, because the nc files are saved for each ensemble member. But when I do fs38_processed_datastore.search(variable="thetao", frequency = '1mon').to_dask()I only get one field. I don’t know which ensemble member is being loaded now. Am I just not suppose to store different ensemble members in the same folder when using the builders?

Perhaps also a better example dir for the Cmip6 builder would have been /g/data/fs38/publications/CMIP6/ScenarioMIP/CSIRO/ACCESS-ESM1-5/ssp585/ since this follows the file structure of many ensemble members filed separately? I think most CSIRO maintained CMIP data is stored like this. I tried out the Cmip6Builder on this as below to see and ended up with a ValueError: asset list provided is None. Please run `.get_assets()` first error.

path_list = os.listdir('/g/data/fs38/publications/CMIP6/ScenarioMIP/CSIRO/ACCESS-ESM1-5/ssp585')
path_str = ["/g/data/fs38/publications/CMIP6/ScenarioMIP/CSIRO/ACCESS-ESM1-5/ssp585/"+ path_list[i] for i in range(len(path_list))]
builder = Cmip6Builder(
   path= path_str ,
   ensemble=True,
).build()

Can you give me the command you used to build the datastore - is it the same as the one in your last post but with the old path? I’ll rebuild one in my scratch space and see what’s wrong.

I think most likely the issue is that I forgot to add the ensemble keyword to the Cmip6Builder, so that argument is being ignored.

Assuming this is the case, we can fix this & push a bugfix release pretty quickly - probably even today.

%%time

builder = Cmip6Builder(
    path="/g/data/e14/afp599/access-esm/fs38_processed",
    ensemble=False # We could use this to pass multiple paths for different ensemble members "/g/data/e14/afp599/access-esm/post_processed_mw2",
).build()

I realise ensemble = False, but based on this I thought the ensemble members had to be in different folders.

I think we’ll want to change ensemble to True - I don’t think it’ll matter too much right now.

That documentation isn’t too clear - I’ll update it.

I’m taking a look at building that datastore now - I’ll let you know how it goes.

Turns out I jumped the gun a little on this - we’ll need to do a bugfix release.

In the meantime, you should be able to use the test datastore that I used which is here - you should be able to read it, but let me know if I’ve got the permissions wrong: /scratch/tm70/ct1163/ellie_cmip6/experiment_datastore_ensemble.json

Thank you, I’ve asked to join tm70 but if thats not allowed the datastore could probably go on e14? Cheers

I’ve copied the datastore into /scratch/e14/ct1163/ellie-ds/ - lemme know if it works for you!

1 Like

it does work, and with the correct ensemble members. thank you!!

2 Likes

Hey Ellie,

Could you mark this as resolved if it is (I think I’m right in saying that)?

I’m at the limit of how many topics I can be assigned and need to close out some old ones :sweat_smile:

Cheers!