NCI systems folks might set up a regular dump of NCI resource commands if you ask nicely (help@nci.org.au). At least they have done so in the past. Otherwise you need a user who is a member of all the groups you want to capture and set up a scheduled job to do it. This is what we’re doing at ACCESS-NRI (from GitHub).
ACCESS-NRI is using a grafana cloud account (free tier) for the grafana dashboarding and a postgresql DB back-end on a NECTAR VM. Happy to share what we’ve done, but note we’re going to refactor the part that updates the DB to use the Django server/apps @tmcadam and @CharlesTurner have developed.
If you have access to your scheme’s info in mancini you can get csv files for historical compute and storage use for all of the scheme’s projects in the scheme dashboard. It’s a bit of a hassle to set up authentication to download automatically though - if you’re interested I can provide a script.
We have some python scripts that collect this as well as nci_account and nci_files_report output that dump the output into a /g/data space that’s mounted on a web server, then have a basic dashboard using plotly to present the data.
We may at some stage look at using PowerBI to set up a dashboard instead, since we have access to that through sharepoint.
You should be able to get the data source that plotly is displaying by looking at your browser developer tools. You can authenticate with mancini in a python session with
from contextlib import contextmanager
from bs4 import BeautifulSoup as BS
import requests
@contextmanager
def mancini_session():
with requests.Session() as s:
s.headers['Origin'] = 'https://my.nci.org.au'
r = s.get("https://my.nci.org.au/mancini/login")
r.raise_for_status()
# Collect xsrf tokens required to log in
soup = BS(r.text, 'html.parser')
login_form = soup.find(id='login-form')
form = {}
for i in login_form.find_all('input'):
if i.get('type', None) == 'submit':
continue
form[i['name']] = i.get('value', None)
# Do the login
form['username'] = os.environ['SCHEME_USER']
form['password'] = os.environ['SCHEME_PASS']
headers = {'Referer': 'https://my.nci.org.au/mancini/login'}
r = s.post('https://my.nci.org.au/mancini/login', data=form, headers=headers)
r.raise_for_status()
yield s
with mancini_session() as s:
# Replace with whatever URL you have access to
r = s.get(f'https://my.nci.org.au/mancini/scheme/{scheme}/compute/csv')
It seems access to those .csv resource will take a while.
In the interim, I was going to write a simple cron script to run the various nci command-line resource scripts and pipe them to a text file, i.e. lquota -v
and nci_account -P <project> -v
for each project.
Then build some logic to parse the output into a pandas dataframe and then plot them. That’s easy for me and within my wheelhouse.
However, you seem to know more about web-scraping than I do and mentioned you have some scripts which already collate and process this data?
There’s no need to parse nci_account’s output - nci_account reads data from that url (it returns JSON data) and does some formatting around it, e.g. try printing r.json() to see the result. nci-files-report and nqstat both work similarly, calling an API and formatting the json output. These APIs are only accessible from Gadi itself.
lquota is a bit different, it hooks into the lustre library. NCI’s python interface for doing this is at /opt/nci/lquota/lustre.py.