I have been managing dx2
(and other datasets) for a number of years and I have a few tips to share. I have been putting this off because I wanted it to be comprehensive, but that just means it never gets done.
Managing an NCI project (or multiple projects
There are three aspects to the management of an NCI project: compute, disk (and tape), users (people).
Compute
nci_account
is your friend. nci_account -v
even more so. This utility lists the users and the amount of compute that they have consumed. This is essential if you want to find where all you allocation is going and to catch a user that might be ‘accidentally’ consuming more than you expected. (‘accidentally’ because there are numerous reasons for this to happen - Wrong queue, too many CPUs, forgotten (or lost) job, wrong project - more on that later). It would be wise to have a regular job that writes this report to a dated file once a week.
There is a helpful tab on mancini called ‘Accounting’. This gives a comparison of current compute usage against allocation. It is an instantaneous calculation and you could do it in your head from the nci_account
figures, but it can be helpful in messaging.
Disk (and Tape)
nci-files-report
is your friend. nci-files-report --project {project}
even more so. This lists all the files under your /g/data/{project}
directory that you are using. You can also add nci-files-report --project {project} --filesystem gdata
if you want to distinguish gdata
over scratch
. This report will identify all the users and the disk space that they are consuming. It will also tell you about the users that have files under your project directory, but that are not in the project group. Files that are not in the group mess with quota allocation and will cause you problems (more later).
The alternate report is nci-files-report --group {project} --filesystem gdata
and this is valuable in determining who stole
your allocation. These are files under another project’s /g/data
directory that you cannot see, but are using your allocation. You can identify the ‘offending’ user this way. This is the major reason that the du
(disk usage) sum is different to the project report. The other reason is the report is run periodically and there is a delay in reporting.
There are few tools to manage tape (MDSS), but mdss ls -l
will list the size of all the files under a directory, which is very helpful.
In summary project
tells you the usage under the project and group
tells you where your quota went.
Users (People or Wetware)
This is the hardest part.
Who are they?
There is a Linux command, getent passwd {user}
, this will allow you to get the name of the user associated with the userid. You can also look them up on mancini.
Communicating with users
There is an envelop icon on mancini that you are able to send messages to all users. This can be sub-optimal as you don’t get read receipts, but as a first point of contact it is excellent. All users in your project have an email against their account and you can get this through mancini. I have in the past scripted a scrape of these emails for more direct interaction.
Why are they in the project
Obviously, most users will be in your project because you approved them, but not always. You may be taking over a project that you are unfamiliar with. From a management point of view, knowing why a user is on the project is pretty important, especially if they are using resources. Some people have joined the project to be able to access the data on the project and they are not expected to be using the compute or disk resources, so if they ‘accidentally’ do use the resources, you need to be able to respond.
Are they still on the project?
People will leave the project and you should remove their access. This is probably the most tedious part of managing the project. You usually cannot contact them and get them to remove files, and possibly you want to keep the files. Emails to NCI Help are really the only way to address this and the simplest message you can send is that the user has left the project and you wish their files to be allocated to your user account. If the user has files in another project, but under your project, same message applies with a please allocate files and directories to the containing project.
Periodic review. You do need to ask users to validate their continued use of the project (periodically). This is where a spreadsheet can help.
Contact me
I am very happy to share code snippets and advice. Please feel free to discuss this with me online and offline.