Xp65 on ARE VDI - unable to call on /opt/conda/*

Hi all,

I am experiencing an issue when trying to run xp65 analysis3 within ARE VDI.

I have followed the helpful instructions (inc. adding gdata/xp65 under Storage). When I attempt to execute python from the VDI terminal I get a No such file or directory error:

It seems that /opt/conda/* is not available for me to call. Have I overlooked a step that makes this available on ARE VDI (it works fine on ARE JupyterLab)?

Thanks for your help.

It’s a technical issue - VDI runs using a container, and the xp65 environment is also using a container. You can’t run a container from inside another container.

Simple fix is to exit the VDI container first, which is simply done with ssh localhost. You should then be able to use conda inside the ssh session.

3 Likes

Thanks @Scott

Sounds like something we should add to the docs.

Would you have time to make an issue for this @mblack?

3 Likes

done, thanks

2 Likes

No thank you!

For anyone following along at home:

At the risk of derailing the useful discussion above, I think I have a similar or near-identical issue.

When trying to create a dask PBS cluster within a Jupyter notebook in ARE, the cluster has an error and exits prematurely. The dask-worker.e… file says:

/local/spool/pbs/mom_priv/jobs/140245218.gadi-pbs.SC: line 12: /g/data/xp65/public/apps/med_conda/envs/analysis3-24.07/bin/python: No such file or directory

While Scott’s suggestion above may be applicable here, I am not sure how I’d exit the (parent?) container in a situation like this.

I’m initialising the PBS cluster like this in my notebook (storages line snipped):

from dask_jobqueue import PBSCluster
from dask.distributed import Client

cores=2
memory="9GB"
processes=2
walltime = '1:00:00'
storages = 'gdata/xp65+ ... <snipped> ... +gdata/ia39'

cluster = PBSCluster(walltime=str(walltime), cores=cores, memory=str(memory), processes=processes,
                     job_extra_directives=['-q normalbw', '-l ncpus='+str(cores), '-l mem='+str(memory), '-l storage='+storages, '-l jobfs=10GB', '-P ai05'],
                     job_directives_skip=["select"])

cluster.scale(jobs=1)  # Scale the resource to this many nodes
client = Client(cluster)
print(f"Dask Client started. Dashboard URL: {client.dashboard_link}")

This worked in hh5 so I presume it’s roughly correct.

Thanks in advance for suggestions.

Regards,
Aurel.

Try setting the path to python, as Using dask_jobqueue in the new xp65 environment - Technical - ACCESS Hive Community Forum

2 Likes