I have been using a tool called mdssdiff (https://github.com/coecms/mdssdiff) intermittently to check whether files have been moved into the mass data storage correctly when I’m archiving things.
This tool was available in the analysis3-24.11 environment, but now I am getting the following error when trying to use that environment:
FATAL: while loading overlay images: failed to open overlay image /g/data/xp65/public/apps/med_conda/envs/analysis3-24.11.sqsh: failed to retrieve path for /g/data/xp65/public/apps/med_conda/envs/analysis3-24.11.sqsh: lstat /g/data/xp65/public/apps/med_conda/envs/analysis3-24.11.sqsh: no such file or directory
I’ve tried more recent modules (eg. conda/analysis3-26.01) but the mdssdiff tool is not found.
Can you advise how I might be able to access the tool? How can I check which versions of the conda/analysis3 environment it is in? Or, if it is no longer available, can you suggest another way that I can check transfer of my files to mass data?
Environment:
NCI project xp65 conda/analysis3 module
What executed:
module use /g/data3/xp65/public/modules
module load conda/analysis3-24.11
mdssdiff -p BIOS3_forcing CNRM-ESM2-1
Actual results:
FATAL: while loading overlay images: failed to open overlay image /g/data/xp65/public/apps/med_conda/envs/analysis3-24.11.sqsh: failed to retrieve path for /g/data/xp65/public/apps/med_conda/envs/analysis3-24.11.sqsh: lstat /g/data/xp65/public/apps/med_conda/envs/analysis3-24.11.sqsh: no such file or directory
Expected results:
Expect this to look inside the directory CNRM-ESM2-1 from where I execute the tool and the corresponding directory on massdata, examine information about the files contained within (ie. filename, size, modification date) and report back on any differences (ie. missing files, different size files, different modification dates).
Additional info:
Aidan
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
3
The conda/analysis3-24.11 environment has been deprecated and is no longer available.
You can find all the packages in a conda enviroment using conda list and use grep to see if the package exists.
For example
$ module use /g/data/xp65/public/modules/
$ module load conda/analysis3-26.01
Loading conda/analysis3-26.01
Loading requirement: singularity
$ conda list | grep mdssdiff
$
I can tell you though that mdssdiff is not any any current released conda environments.
The installed packages are defined in this file
And if you clone the repo and search through the history of the file, the mdssdiff package was removed and never put back
I wrote mdssdiff, when I was part of the CMS team, but it is no longer maintained.
Unless someone else decides to pick it up, maintain it and publish a conda package your best path is to copy the approach I suggested for others who wanted to use another unmaintained package I wrote, which is to create your own virtual environment on top of conda/analysis3
Just chiming in to say that if ACCESS-NRI or someone else wants to maintain mdssdiff or similar massdata tools, that would be amazing. It’s always so painful to do massdata transfers and tools like mdssdiff have been super useful at making it less painful.
Aidan
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
5
First port of call is to ask help@nci.org.au if mdssdiff, or a tool with similar functionality, could be officially supported. If it’s a clear user need then they should provide appropriate tools, and they are the ones who are best placed to do so.
It’s not something that is a natural fit for any specific team at ACCESS-NRI, so I can’t suggest someone to talk to about it.
You could discuss at your working group meeting, find out if there is a wider need for something like this and if so kick it up to SAC.
I actually asked help@nci.org.au about this tool first and they sent me to ACCESS-NRI. I guess because it was inside one of the conda environments that you support.
It’s strange that it was quietly removed, without anyone getting upset, but then maybe no one really noticed because it is used so intermittently.
Anyway, good idea. I’ll bring it up at the Land Working Group meeting tomorrow to see what the process is to get it supported (if indeed that is possible).
But I"m still stuck with the problem is determining whether my transfers to massdata have been successful.
Any ideas what do other people do?
Aidan
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
7
Yes, and I can see from a support point of view that makes total sense in that it was a tool that was provided by someone else.
If you followed up with something like “This is required functionality that is no longer support (I checked). Can NCI provide this or similar functionality?” it might get a bit further.
Check out my reply above. You can make a virtual environment and install the tool yourself. I tested it and it works:
$ module use /g/data/xp65/public/modules/
$ module load conda/analysis3
Loading conda/analysis3-26.03
Loading requirement: singularity
$ python3 -m venv mdssdiff --system-site-packages
$ source mdssdiff/bin/activate
((mdssdiff) ) $ pip install git+https://github.com/coecms/mdssdiff.git
Collecting git+https://github.com/coecms/mdssdiff.git
Cloning https://github.com/coecms/mdssdiff.git to /scratch/t.../tmp/pip-req-build-7ol990gq
Running command git clone --filter=blob:none --quiet https://github.com/coecms/mdssdiff.git /scratch/.../tmp/pip-req-build-7ol990gq
Resolved https://github.com/coecms/mdssdiff.git to commit 0486a757aecdff9696c3c61d2de6f54ef74c1893
Preparing metadata (setup.py) ... done
Building wheels for collected packages: mdssdiff
Building wheel for mdssdiff (setup.py) ... done
Created wheel for mdssdiff: filename=mdssdiff-0.1.12.dev3-py3-none-any.whl size=13066 sha256=c367a9068d129627f6d276f4895df56f92bbeb4fd85dcdfcf9366e1b6696014a
Stored in directory: /scratch/.../tmp/pip-ephem-wheel-cache-d9ptzumb/wheels/51/6c/c9/aff07514ed316f01e6856f40fdd57bf1e3434eb94a4c8377fc
Successfully built mdssdiff
Installing collected packages: mdssdiff
Successfully installed mdssdiff-0.1.12.dev3
[notice] A new release of pip is available: 25.0.1 -> 26.1.1
[notice] To update, run: pip3.12 install --upgrade pip
((mdssdiff) ) $ mdssdiff --help
usage: mdssdiff [-h] [-v] [-P PROJECT] [-p PATHPREFIX] [-r] [-m MATCH] [-cr | -cl] [-f] inputs [inputs ...]
Compare local directories and those on mdss. Report differences
positional arguments:
inputs netCDF files or directories (-r must be specified to recursively descend directories)
options:
-h, --help show this help message and exit
-v, --verbose Increase verbosity
-P PROJECT, --project PROJECT
Project code for mdss (default to $PROJECT)
-p PATHPREFIX, --pathprefix PATHPREFIX
Prefix for mdss path
-r, --recursive Recursively descend directories (default False)
-m MATCH, --match MATCH
Operate only on files matching filter
-cr, --copyremote Copy files from local filesyste to mdss that are missing (False)
-cl, --copylocal Copy files from mdss to local filesystem that are missing (False)
-f, --force Force copying of different, following --copyremote or --copylocal (False)
((mdssdiff) ) $
Aidan
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
8
Note the suggestion above does require activating the virtual environment to use the mdssdiff tool.
You can make a single virtual environment with multiple additional tools if that makes it easier and more efficient for you. In general this will only work for python packages without external binary dependencies.
Aidan
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
9
If this solved your problem @AlisonBennett please feel free to mark a solution and I"ll close the topic.
Hi Aiden, thanks for reminding me about this. No, the problem is not really solved - I’m not keen on using virtual environments for this, but I did raise it at the Land Working Group and there was a general agreement that it is a valuable too and that the support of it (or something similar) by the NCI would be valuable. So, I have logged a request with NCI to support it..
I will follow up with them - so you can close this for now. I will let you (or Claire) know if something comes of it.