I’m trying to run a suite that has previously worked for me… using ERA5 as the driving model over the Antarctic coast line (I run twice - once un-rotated grid to generate ERA5 files, then rotated grid for the model runs).
The nci_era5grib runs for about 50 mins now and then crashes with no helpful error message (to me at least).
Hi @sonyafiddes. Thanks for letting us know. There is an issue with CDO in the current conda/analysis3 environment (Error using '-t ecmwf' - CDO - Project Management Service). Unfortunately this was only discovered after the environment became stable. It was resolved in the unstable environment yesterday. There is some other issue with the newer analysis envs and the underlying era5grib package. Switching back to analysis3/23.01 resolves the issue for now. @Paola-CMS is looking at refactoring era5grib package, which should clear up this issue as well as the cdo dependence.
Thank @Scott and @dale.roberts, I’ve changed the app/nci_era5grib/rose-app.conf file to now module load conda/analysis3-23.01, but it’s still not producing any grib files (its been running for about 20mins and nothing - it should only take a few mins each file…) - have I missed something?
I also changed the site/nci-gadi/suite-adds.rc to use module load conda/analysis3/23.01 too…
Hi @sonyafiddes. I’ve noticed when using nci_era5grib that the runtime can be quite variable. I think the main cause of this is that a) era5grib writes the dataset as netCDF to $TMPDIR in order to have cdo convert it to grib format and b) cylc overrides $TMPDIR such that these temporary netCDF files end up on /scratch or /g/data. I’ve found that if I reset TMPDIR to $PBS_JOBFS in the tasks’s [[[environment]]] section, the runtime is much more consistent.
I’ve set TMPDIR=$PBS_JOBFS under # TASK RUNTIME ENVIRONMENT: in the job file (I then just qsub it manually), but it still only created two grib files in two hours. Have I done something wrong here? Or is there another way to speed this up?
Hi @sonyafiddes. At this stage the only thing that comes to mind is the interpreter line in app/nci_era5grib/bin/nci_era5grib.py. If the first line of nci_era5grib.py looks something like this:
No worries. Yeah, we’re aware its an issue in everything after 23.04. I think its a dask problem based on the logging I can see when the job runs. Hard to tell though. We’ll keep working on it, being stuck on analysis3-23.01 isn’t sustainable in the long run.