Using nccompress without the hh5 conda environment

Good gracious! It took me a few minutes to track it down and I wrote it! :slight_smile:

It is part of the coecms/nccompress package:

The nccompress package has been written to facilitate compressing netcdf files. Although nccompress can work on single files, it is particularly useful to compress all uncompressed files under whole directory trees. This can allow users to compress files regularly using the same script each time.

The nccompress package consists of three python programs, ncfind, nc2nc and nccompress. nc2nc can copy netCDF files with compression and an optimised chunking strategy that has reasonable performance for many datasets. His two main limitations: it is slower than some other programs, and it can only compress netCDF3 or netCDF4 classic format. There is more detail in the following sections.

The convenience utility ncvarinfo is also included, and though it has no direct relevance to compression, it is a convenient way to get a summary of the contents of a netCDF file.

Thanks Aidan, got hold of it.
Cheers

For those following along at home; can I find this on gadi (and if so, where?) or do I need to install for myself?

I think an install into a virtual environment is your best bet. The code is not currently maintained, so I can’t see a use case to put it into a shared conda environment until that status changes.

Is there a NetCDF compression tool equivalent to nccompress in the new xp65 space? I was routinely running it on my files but I can see it is no longer available. I have no idea what installing to a virtual environment means :sweat_smile: I know python has compression options but I also use other languages to work with NetCDFs. Thanks :slight_smile:

I am glad you found nccompress useful @aukkola. Assuming that is what you’re referring to?

It hasn’t had substantial changes in over 6 years, so I can’t guarantee it is still fit for purpose.

There may well now be better options available. e.g. nco or cdo (though the cdo documentation seems to be publicly inaccessible currently).

Thanks Aidan. That’s a shame nccompress is no longer maintained, it was such a no-brainer to run it on files to save storage space. I’ll see if I can find something similar using NCO/CDO :slight_smile:

@Aidan and @aukkola I also regularly used nccompress until the hh5 environment was decommissioned. If there’s any chance it could be added to xp65, that would be great. I’ve got a bunch of files I need to compress (should’ve done it properly when I created them…), and nccompress would’ve worked really well.

I think there is very little likelihood of it being added to the conda/analysis environments. However, you can load that environment and then install nccompress into a virtual environment.

Following the most excellent instructions from the CLEX CMS blog

this worked for me:

$ module use /g/data/xp65/public/modules/
$ module load conda/analysis3
Loading conda/analysis3-26.03
  Loading requirement: singularity
$ python3 -m venv nccompress --system-site-packages
$ source nccompress/bin/activate
((nccompress) ) $ pip install git+https://github.com/coecms/nccompress.git@master
Collecting git+https://github.com/coecms/nccompress.git@master
  Cloning https://github.com/coecms/nccompress.git (to revision master) to /scratch/tm70/aph502/tmp/pip-req-build-v7l7uii7
  Running command git clone --filter=blob:none --quiet https://github.com/coecms/nccompress.git /scratch/tm70/aph502/tmp/pip-req-build-v7l7uii7
  Resolved https://github.com/coecms/nccompress.git to commit ce3f3b67be30b6a6838ff1b3c936f9cb537b45ec
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: nccompress
  Building wheel for nccompress (setup.py) ... done
  Created wheel for nccompress: filename=nccompress-0.2.2.dev1-py3-none-any.whl size=27807 sha256=70f60700638be8e60682d189b385e296d8483af8a72976d0a2444fb7f323588c
  Stored in directory: /scratch/tm70/aph502/tmp/pip-ephem-wheel-cache-mbmghn2e/wheels/8b/c8/7a/9ce85d0a8ece792f8348bd0aa343903e30f23359bf5692a890
Successfully built nccompress
Installing collected packages: nccompress
Successfully installed nccompress-0.2.2.dev1

[notice] A new release of pip is available: 25.0.1 -> 26.0.1
[notice] To update, run: pip3.12 install --upgrade pip
((nccompress) ) $ which nccompress
~/nccompress/bin/nccompress
((nccompress) ) $ nccompress -h
usage: nccompress [-h] [-d {1-9}] [-n] [-s CHUNKSIZE] [-b BUFFERSIZE] [-t TMPDIR] [-v] [-r] [-o] [-m MAXCOMPRESS] [-p] [-f] [-c] [-pa] [-np NUMPROC]
                  [-ff FROMFILE] [--nccopy] [--timing]
                  [inputs ...]

Run nc2nc (or nccopy) on a number of netCDF files

positional arguments:
  inputs                netCDF files or directories (-r must be specified to recursively descend directories). Can accept piped arguments.

options:
  -h, --help            show this help message and exit
  -d {1-9}, --dlevel {1-9}
                        Set deflate level. Valid values 0-9 (default=5)
  -n, --noshuffle       Don't shuffle on deflation (default is to shuffle)
  -s CHUNKSIZE, --chunksize CHUNKSIZE
                        Set chunksize - total size of one chunk in KiB (default=64), nc2nc only
  -b BUFFERSIZE, --buffersize BUFFERSIZE
                        Set size of copy buffer in MiB (default=500), nc2nc only
  -t TMPDIR, --tmpdir TMPDIR
                        Specify temporary directory to save compressed files
  -v, --verbose         Verbose output
  -r, --recursive       Recursively descend directories compressing all netCDF files (default False)
  -o, --overwrite       Overwrite original files with compressed versions (default is to not overwrite)
  -m MAXCOMPRESS, --maxcompress MAXCOMPRESS
                        Set a maximum compression as a paranoid check on success of nccopy (default is 10, set to zero for no check)
  -p, --paranoid        Paranoid check : run nco ndiff on the resulting file ensure no data has been altered
  -f, --force           Force compression, even if input file is already compressed (default False)
  -c, --clean           Clean tmpdir by removing existing compressed files before starting (default False)
  -pa, --parallel       Compress files in parallel
  -np NUMPROC, --numproc NUMPROC
                        Specify the number of processes to use in parallel operation
  -ff FROMFILE, --fromfile FROMFILE
                        Read files to be compressed from a text file
  --nccopy              Use nccopy instead of nc2nc (default False)
  --timing              Collect timing statistics when compressing each file (default False)
((nccompress) ) $ deactivate
$ 

DO THIS AT YOUR OWN RISK!

There are absolutely no guarantees this still works, so definitely do a test before you point it at your precious research outputs.