UM Nesting Suite - can't run nci_era5grib

,

Hi Team,

I’m running the RNS of RAL3p2 over Davis, Antarctica. The ERA5 is used to drive the model. While the model fails when run the nci_era5grib, showing the following information in job.err and job.out.

job.err

Using the cylc session localhost

Loading cylc7/24.03
Loading requirement: mosrs-setup/2.0.1
Using the cylc session localhost

Loading cylc7/24.03
Loading requirement: mosrs-setup/2.0.1
Using the cylc session localhost

Loading cylc7/24.03
Loading requirement: mosrs-setup/2.0.1
Traceback (most recent call last):
File “/home/581/zp1400/cylc-run/u-cu870/app/nci_era5grib/bin/nci_era5grib.py”, line 15, in
import era5grib
File “/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/era5grib/init.py”, line 17, in
from .era5grib import *
File “/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/era5grib/era5grib.py”, line 24, in
from climtas.regrid import Regridder
File “/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/climtas/init.py”, line 6, in
from . import event
File “/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/climtas/event.py”, line 23, in
from dask.dataframe.core import new_dd_object
File “/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/dask/dataframe/init.py”, line 4, in
from dask.dataframe import backends, dispatch, rolling
File “/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/dask/dataframe/backends.py”, line 18, in
from dask.array.core import Array
File “/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/dask/array/init.py”, line 2, in
from dask.array import backends, fft, lib, linalg, ma, overlap, random
File “/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/dask/array/backends.py”, line 6, in
from dask.array.core import Array
File “/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/dask/array/core.py”, line 37, in
from dask.array.chunk_types import is_valid_array_chunk, is_valid_chunk_type
File “/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/dask/array/chunk_types.py”, line 122, in
import sparse
File “/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/sparse/init.py”, line 1, in
from ._coo import COO, as_coo
File “/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/sparse/_coo/init.py”, line 1, in
from .core import COO, as_coo
File “/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/sparse/_coo/core.py”, line 9, in
import numba
File “/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/numba/init.py”, line 42, in
from numba.np.ufunc import (vectorize, guvectorize, threading_layer,
File “/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/numba/np/ufunc/init.py”, line 3, in
from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize
File “/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/numba/np/ufunc/decorators.py”, line 3, in
from numba.np.ufunc import _internal
SystemError: initialization of _internal failed without raising an exception
[FAIL] (module use /g/data/hh5/public/modules; module load conda/analysis3-23.01; nci_era5grib.py --mask $MASK --output $OUTDIR --start $START --count $COUNT --freq $FREQ --era5land $ERA5LAND --polar $POLAR) # return-code=1
Using the cylc session localhost

Loading cylc7/24.03
Loading requirement: mosrs-setup/2.0.1
2024-05-22T06:21:33Z CRITICAL - failed/EXIT

job.out

Suite : u-cu870
Task Job : 20190117T1200Z/nci_era5grib/02 (try 2)
User@Host: zp1400@gadi-cpu-clx-0568.gadi.nci.org.au

2024-05-22T06:21:17Z INFO - started
[INFO] Configuration: /home/581/zp1400/cylc-run/u-cu870/app/nci_era5grib/
[INFO] file: rose-app.conf
[INFO] optional key: (nci-gadi)
[INFO] export ERA5LAND=False
[INFO] export PATH=/home/581/zp1400/cylc-run/u-cu870/app/nci_era5grib/bin:/g/data/hr22/apps/cylc7/rose_2019.01.8/bin:/home/581/zp1400/cylc-run/u-cu870/bin:/home/581/zp1400/cylc-run/u-cu870/bin:/home/581/zp1400/cylc-run/u-cu870/share/fcm_make_surf/build/bin:/g/data/hr22/apps/cylc7/24.03/bin:/g/data/hr22/apps/mosrs-setup/2.0.1/bin:/apps/python2/2.7.16/bin:/apps/openmpi/wrapper/fortran:/apps/openmpi/wrapper:/apps/openmpi/4.0.1/bin:/g/data/hr22/apps/cylc7/cylc_7.9.9/…/24.03/bin:/g/data/hr22/apps/cylc7/cylc_7.9.9/bin:/opt/pbs/default/bin:/opt/nci/bin:/opt/bin:/opt/Modules/v4.3.0/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/local/pbs/bin:/local/pbs/bin
[INFO] export POLAR=True
[INFO] unchanged: /g/data/jk72/zp1400/UM_reg/ERA5_grib_Davis_RAMP2
[INFO] command: (module use /g/data/hh5/public/modules; module load conda/analysis3-23.01; nci_era5grib.py --mask $MASK --output $OUTDIR --start $START --count $COUNT --freq $FREQ --era5land $ERA5LAND --polar $POLAR)

It seems like the job fails at very early step and doesn’t provide valuable information from me.

I’ve already made the changes referred to a previous topic Nci_era5grib no longer working, but still fails. In addition, it can work for @sonyafiddes who copied this suite and only changed the domain details.

It’ll be really appreciated if anyone can help with this issue!

Zhangcheng

Hi Zhangcheng,
This seems to point to an issue with your own environment, I just tried that import

from numba.np.ufunc import _internal

that produces an error for you, with only analsys3.23.01 loaded and it works fine.

Maybe you have a conflicting numba or numpy package in your .local or one of the other paths. This error happens with a combination of numba 0.56.3/0.56.4 and numpy 1.24. See: Error on import with numpy HEAD · Issue #8615 · numba/numba · GitHub

analysis23.01 has numba 0.56.4 and numpy 1.23.5 .

Paola

1 Like

Hi Paola,

Thanks for your suggestions!

I’ve done a bunch of module purges and checked my local environment, but still has the problem.

I have the default loaded modules as below when login on gadi. I’ve tried unloading all of them and loaded cylc7 then run, but still failed.

[zp1400@gadi-login-03 ~]$ module list
Currently Loaded Modulefiles:

  1. pbs 3) openmpi/4.1.4(default) 5) ncview/2.1.7
  2. dot 4) cdo/2.0.5 6) netcdf/4.7.3(default)

I wonder do you have idea about how to solve this problem?

Thanks very much!

Zhangcheng

Sorry, I can’t really suggest much else, era5grib works in that environment so clearly the problem comes from elsewhere. Without having access to your environment it’s impossible to work out what you are doing differently from Sonya that is causing this.
The only other thing I can suggest is to avoid to load modules by default in your .bashrc, as this could cause unexpected issues, even if they might not be interfering in this particular case.

2 Likes

Hi Zhangcheng,

Could you please try running the following commands in your terminal

module use /g/data/hh5/public/modules/
module load conda/analysis3-23.01
which python
python -I -c 'from numba.np.ufunc import _internal'
python -c 'from numba.np.ufunc import _internal'

The command with -I makes sure that no other packages are being combined with the hh5 environment. Hopefully that one is working for you, and presumably the second python command will break.

3 Likes

Hi Scott,

It’s exactly like what you said.

[zp1400@um ~]$ module use /g/data/hh5/public/modules/
[zp1400@um ~]$ module load conda/analysis3-23.01
[zp1400@um ~]$ which python
/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/bin/python
[zp1400@um ~]$ python -I -c 'from numba.np.ufunc import _internal'
[zp1400@um ~]$ python -c 'from numba.np.ufunc import _internal'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/numba/__init__.py", line 42, in <module>
    from numba.np.ufunc import (vectorize, guvectorize, threading_layer,
  File "/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/numba/np/ufunc/__init__.py", line 3, in <module>
    from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize
  File "/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/lib/python3.9/site-packages/numba/np/ufunc/decorators.py", line 3, in <module>
    from numba.np.ufunc import _internal
SystemError: initialization of _internal failed without raising an exception

Seems like I do load other packages conflicting with conda/analysis3-23.01. Do you have a feeling where that problem might be in? Here’s what my .bash_profile and .bashrc looks like. I deleted some loaded modules in .bash_profile but still not work.

# .bash_profile
  
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs
module use /g/data/hr22/modulefiles
module use /projects/access/modules/
# .bashrc

# Source global definitions (Required for modules)
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

if in_interactive_shell; then
    # This is where you put settings that you'd like in
    # interactive shells. E.g. prompts, or aliases
    # The 'module' command offers path manipulation that
    # will only modify the path if the entry to be added
    # is not already present. Use these functions instead of e.g.
    # PATH=${HOME}/bin:$PATH

    prepend_path PATH ${HOME}/bin
    prepend_path PATH ${HOME}/.local/bin

    if in_login_shell; then
        # This is where you place things that should only
        # run when you login. If you'd like to run a
        # command that displays the status of something, or
        # load a module, or change directory, this is the
        # place to put it
        module load pbs
        # cd /scratch/${PROJECT}/${USER}
    fi

fi

# Anything here will run whenever a new shell is launched, which
# includes when running commands like 'less'. Commands that
# produce output should not be placed in this section.
#
# If you need different behaviour depending on what machine you're
# using to connect to Gadi, you can use the following test:
#
# if [[ $SSH_CLIENT =~ 11.22.33.44 ]]; then
#     Do something when I connect from the IP 11.22.33.44
# fi
#
# If you want different behaviour when entering a PBS job (e.g.
# a default set of modules), test on the $in_pbs_job variable.
# This will run when any new shell is launched in a PBS job,
# so it should not produce output
#
# if in_pbs_job; then
#      module load openmpi/4.0.1
# fi
# gadi-cylc-setup: DO NOT EDIT BETWEEN HERE AND END
function setup_cylc7_rose {
  module use /g/data/hr22/modulefiles
  module unload cylc7-rose
  module load cylc7-rose/7.9.7_2019.01.7
  module unload mosrs-setup
  module load mosrs-setup/0.9.2
}
if in_interactive_shell; then
  if in_login_shell; then
   # setup_cylc7_rose
    module list
  fi
fi
# gadi-cylc-setup: END

# mosrs-setup gpg_agent_script: DO NOT EDIT BETWEEN HERE AND END
function export_gpg_environ {
    export GPG_TTY=$(tty)
    export GPG_AGENT_INFO="$(gpgconf --list-dirs agent-socket):0:1"
}
function start_gpg_agent {
    mkdir -p -m u=rwx,go=--- $HOME/.gnupg
    gpg-connect-agent /bye
    export_gpg_environ
}
if in_interactive_shell; then
    if in_login_shell; then
        start_gpg_agent
    fi
fi
# mosrs-setup gpg_agent_script: END

Many thanks,
Zhangcheng

As a first check, try renaming your ~/.local directory - this is where packages are installed by pip install --user and these can conflict with conda environments.

If that doesn’t help check the output of

python -c 'import sys; print(sys.path)'

which will show all the locations Python can import pacakges from

The nci_era5grib start working after I rename my ~/.local directory :blush:.

Really appreciated it Scott and Paola for your advice!

3 Likes