I see CMS has developed a new containerised (singularity) conda environment
@dale.roberts is this basically the same approach @Scott has used for this containerised environments? If not I’d be interested in how they differ from a technological point of view, and why.
Note for others: singularity is a type of container technology that has been renamed to apptainer. It can be a bit confusing if you don’t know this and see references to apptainer, or sources that seem to imply the singularity project has ceased.
I’m sure many of you would have seen the new analysis environment email that just went out. The main idea behind ‘containerising’ the conda environments is to reduce file count. However, running everything strictly inside singularity (Sylabs, not apptainer) has its disadvantages (e.g. setgid binaries no longer work, loss of access to PBS, unknown operating system). This attempts to ameliorate that in the way that SHPC does, by only entering the container when absolutely necessary (e.g. running python3). What I’ve not seen before is anyone taking advantage of singularity’s ability to manage squashfs file systems for application purposes. Each conda env lives in its own squashfs, and can be mounted into the container as needed. The container itself is a series of directories and symlinks that just bind-mount in most of Gadi’s OS, bypassing the need to maintain a separate OS within the container itself. The use of individual squashfs for conda envs mean the container never has to be updated unless there are major changes to Gadi’s OS layout. More details can be found here: Introducing the new hh5 conda environment — CLEX CMS Blog, and more technical info can be found here: Conda hh5 environment setup — CLEX CMS Wiki (this page is slightly out of date but that should be fixed soon)
Thanks Scott. I probably shouldn’t have listed setgid binaries first. The main issue I have with singularity is that, in an HPC context, you’re eschewing the advantages of a well configured, specialised environment for some generic Ubuntu installation. Gadi has a particularly large OS by HPC standards, and new OS packages get added in fairly frequently. You lose that by running solely inside the container, you’ll spend so much time trying to align the environment with the host system (just ask the ARE team how that’s going) so that everything inside the container works almost as well as everything outside of it. You also lose out by trying to guess MPI configurations and the like that have already been specifically configured and tuned for the system. This way you can have conda/analysis3 in your environment, and also have Gadi-configured OpenMPI.
Back to setgid though, believe it or not you run one every time you log into Gadi; nfnewgrp. The other common setgid binary is ssh_keysign, which is used for host-based auth between Gadi nodes.
Edited to fix typos and add to by setgid, I meant setgid and setuid
Apptainer: open source Singularity, recently renamed and hosted by the Linux Foundation. As of Fall 2022 all three Apptainer/Singularity versions are compatible and practically the same, but have different roadmaps. There is hope that in the future they will join forces, but this is not currently the case. To understand how this came to be you can read the Singularity history on Wikipedia.
Apptainer is the ‘original’ it was called ‘Singularity’ until 2021, Sylabs did a lot of the work, then forked it as a commercial product but kept calling it ‘Singularity’. I don’t know what the differences are. I’ve not used apptainer, so can’t comment on whether its has the same limitations or not. This project was targeted for Gadi, and Gadi has Sylabs Singularity 3.7.0.
The irony is that containerisation, in this case, is a necessary evil. All I really need from singularity is its ability to create a private mount namespace and mount some kind of read-only file system within it. I’d much rather do that the same way NCI does with its -lstorage flags, but as an unprivileged user I can’t place setuid binaries on the system. The majority of the development work was working around containerisation. Just being able to go
Heh, don’t mind my griping, they’re absolutely right to not allow that kind of thing. NCI already employs fuse-overlayfs in certain circumstances, but its not something they’re willing to enable for normal users. I’m with them on that, there is no reason for users to be allowed to arbitrarily mount file systems, and setting it up wrong is a huge security risk. Best to just eliminate that attack surface entirely. This is why NCI took so long in adopting singularity in the first place. It runs with elevated privileges, and they need to be damn sure its safe to deploy in a way that isn’t going to compromise the system. Whilst that is annoying for users who see a lot of their applications distributed in containers, it would have been far worse if NCI were an early adopter and then got hit with something like CVE-2019-11328 : An issue was discovered in Singularity 3.1.0 to 3.2.0-rc2, a malicious user with local/network access to the host system. As a national facility, they’re a big target, and have to assume the worst.