NCI Project(s) Management Tips and Tricks

I have been managing dx2 (and other datasets) for a number of years and I have a few tips to share. I have been putting this off because I wanted it to be comprehensive, but that just means it never gets done.

Managing an NCI project (or multiple projects

There are three aspects to the management of an NCI project: compute, disk (and tape), users (people).

Compute

nci_account is your friend. nci_account -v even more so. This utility lists the users and the amount of compute that they have consumed. This is essential if you want to find where all you allocation is going and to catch a user that might be ‘accidentally’ consuming more than you expected. (‘accidentally’ because there are numerous reasons for this to happen - Wrong queue, too many CPUs, forgotten (or lost) job, wrong project - more on that later). It would be wise to have a regular job that writes this report to a dated file once a week.

There is a helpful tab on mancini called ‘Accounting’. This gives a comparison of current compute usage against allocation. It is an instantaneous calculation and you could do it in your head from the nci_account figures, but it can be helpful in messaging.

Disk (and Tape)

nci-files-report is your friend. nci-files-report --project {project} even more so. This lists all the files under your /g/data/{project} directory that you are using. You can also add nci-files-report --project {project} --filesystem gdata if you want to distinguish gdata over scratch. This report will identify all the users and the disk space that they are consuming. It will also tell you about the users that have files under your project directory, but that are not in the project group. Files that are not in the group mess with quota allocation and will cause you problems (more later).

The alternate report is nci-files-report --group {project} --filesystem gdata and this is valuable in determining who stole your allocation. These are files under another project’s /g/data directory that you cannot see, but are using your allocation. You can identify the ‘offending’ user this way. This is the major reason that the du (disk usage) sum is different to the project report. The other reason is the report is run periodically and there is a delay in reporting.

There are few tools to manage tape (MDSS), but mdss ls -l will list the size of all the files under a directory, which is very helpful.

In summary project tells you the usage under the project and group tells you where your quota went.

Users (People or Wetware)

This is the hardest part.

Who are they?

There is a Linux command, getent passwd {user}, this will allow you to get the name of the user associated with the userid. You can also look them up on mancini.

Communicating with users

There is an envelop icon on mancini that you are able to send messages to all users. This can be sub-optimal as you don’t get read receipts, but as a first point of contact it is excellent. All users in your project have an email against their account and you can get this through mancini. I have in the past scripted a scrape of these emails for more direct interaction.

Why are they in the project

Obviously, most users will be in your project because you approved them, but not always. You may be taking over a project that you are unfamiliar with. From a management point of view, knowing why a user is on the project is pretty important, especially if they are using resources. Some people have joined the project to be able to access the data on the project and they are not expected to be using the compute or disk resources, so if they ‘accidentally’ do use the resources, you need to be able to respond.

Are they still on the project?

People will leave the project and you should remove their access. This is probably the most tedious part of managing the project. You usually cannot contact them and get them to remove files, and possibly you want to keep the files. Emails to NCI Help are really the only way to address this and the simplest message you can send is that the user has left the project and you wish their files to be allocated to your user account. If the user has files in another project, but under your project, same message applies with a please allocate files and directories to the containing project.

Periodic review. You do need to ask users to validate their continued use of the project (periodically). This is where a spreadsheet can help.

Contact me

I am very happy to share code snippets and advice. Please feel free to discuss this with me online and offline.

6 Likes

@navidcy, @bethanwhite and @Aidan - FYI and review

Inspired by Difference between nci-files-report and du

1 Like

Wonderful, thank you Griff :heart:
Also tagging @Paul.Gregory who is building a monitoring platform for the CoE projects.

If you are a Lead CI, Delegated Lead CI or Chief investigator you can find the registered email for a user in your project by going to https://my.nci.org.au/mancini/project/ab1/members/active (replace ab1 with project code) and clicking on a user name.

1 Like

Yes - This was the way I collected the username to email mapping by scraping these pages - With about 100 users to get, this would have been overly tedious - I think I got your help with the wget and user credentials to do this

The link in mancini goes to a URL like https://my.nci.org.au/mancini/project/dx2/member/99999 for each username (abc123) and you can get the list of users using https://my.nci.org.au/mancini/_search/lookup/project_membership/dx2?page=1 (where you iterate through the page numbers) - You then have to filter the active users - A bit fiddly, but quicker that manually clicking through (for large numbers).

Nice summary! @griff and thank you for the add-ons @Scott @bethanwhite . Now I know who are managing their projects:) Here’s my two cents.

It is noteworthy that nci_account serves not only as a compute dashboard but also as the primary dashboard for your project within the shell environment. It provides a comprehensive summary of compute, cloud (experimental), and storage usage. The verbose version offers detailed information on compute and cloud usage per user and both compute and storage grants support distribution across stakeholders.

It is the most current record of the project, unlike nci-files-report, which can be 24 hours old.

2 Likes