Nci_era5grib no longer working

sonyafiddes · 7 December 2023 04:33

Hi Team,

I’m trying to run a suite that has previously worked for me… using ERA5 as the driving model over the Antarctic coast line (I run twice - once un-rotated grid to generate ERA5 files, then rotated grid for the model runs).

The nci_era5grib runs for about 50 mins now and then crashes with no helpful error message (to me at least).

[FAIL] (module use /g/data/hh5/public/modules; module load conda; nci_era5grib.py --mask $MASK --output $OUTDIR --start $START --count $COUNT --freq $FREQ --era5land $ERA5LAND --polar $POLAR) # return-code=1

Does anyone know if any thing has changed within this workflow that is causing it to not run as it used to?

Cheers,
Sonya

sonyafiddes · 7 December 2023 04:34

To be clear I’m running the UM RNS RAL3.1 plus some big fixes

Scott · 7 December 2023 23:07

Try using the previous conda environment, it may be that an upgrade has broken something

dale.roberts · 8 December 2023 01:26

Hi @sonyafiddes. Thanks for letting us know. There is an issue with CDO in the current conda/analysis3 environment (Error using '-t ecmwf' - CDO - Project Management Service). Unfortunately this was only discovered after the environment became stable. It was resolved in the unstable environment yesterday. There is some other issue with the newer analysis envs and the underlying era5grib package. Switching back to analysis3/23.01 resolves the issue for now. @Paola-CMS is looking at refactoring era5grib package, which should clear up this issue as well as the cdo dependence.

sonyafiddes · 8 December 2023 03:39

Thank @Scott and @dale.roberts, I’ve changed the app/nci_era5grib/rose-app.conf file to now module load conda/analysis3-23.01, but it’s still not producing any grib files (its been running for about 20mins and nothing - it should only take a few mins each file…) - have I missed something?

I also changed the site/nci-gadi/suite-adds.rc to use module load conda/analysis3/23.01 too…

dale.roberts · 8 December 2023 04:37

Hi @sonyafiddes. I’ve noticed when using nci_era5grib that the runtime can be quite variable. I think the main cause of this is that a) era5grib writes the dataset as netCDF to $TMPDIR in order to have cdo convert it to grib format and b) cylc overrides $TMPDIR such that these temporary netCDF files end up on /scratch or /g/data. I’ve found that if I reset TMPDIR to $PBS_JOBFS in the tasks’s [[[environment]]] section, the runtime is much more consistent.

sonyafiddes · 10 December 2023 21:28

Hi @dale.roberts, thanks for this.

I’ve set TMPDIR=$PBS_JOBFS under # TASK RUNTIME ENVIRONMENT: in the job file (I then just qsub it manually), but it still only created two grib files in two hours. Have I done something wrong here? Or is there another way to speed this up?

dale.roberts · 10 December 2023 23:50

Hi @sonyafiddes. At this stage the only thing that comes to mind is the interpreter line in app/nci_era5grib/bin/nci_era5grib.py. If the first line of nci_era5grib.py looks something like this:

#!/g/data/hh5/public/apps/miniconda3/envs/analysis3/bin/python

then it’ll be using the default analysis3 env regardless of what module is loaded. If you change that to

#!/usr/bin/env python

It’ll use the loaded analysis environment. If its already set to that, then that’s not the problem.

sonyafiddes · 11 December 2023 00:03

That’s done it! Thanks so much Dale! (PS - it was set to the unstable environment… so maybe an issue for the upcoming one too?)

dale.roberts · 11 December 2023 00:20

No worries. Yeah, we’re aware its an issue in everything after 23.04. I think its a dask problem based on the logging I can see when the job runs. Hard to tell though. We’ll keep working on it, being stuck on analysis3-23.01 isn’t sustainable in the long run.

Scott · 11 December 2023 01:28

Might be worth checking if Dask has matured enough that the climtas io is no longer required, that goes into the guts of dask so is fragile to updates

Topic		Replies	Views
UM Nesting Suite - nci_era5grib crashing Regional Nesting Suite	1	240	4 May 2023
UM Nesting Suite - can't run nci_era5grib Regional Nesting Suite help , um	8	182	18 June 2024
Issue with extracting ERA5 data (era5grib) Technical	1	63	18 March 2025
Analysis3 and era5grib Infrastructure help , conda	6	98	2 July 2025
Help from ACCESS-NRI on libaccessom2 netcdf packing issue for COSIMA ERA-5 runs COSIMA	3	247	1 February 2023

Nci_era5grib no longer working

Related topics