How ACCESS-NRI could support ML/AI

The FPWG welcomes your feedback on this document. Please provide comments/input by using the reply function, or editing directly in the google-doc version (in track-changes)

How ACCESS NRI could support ML/AI

Machine learning (ML) infrastructure is the software and hardware foundation required for the development and deployment of ML models.

Although ML models and infrastructure implementations differ widely depending on the application, there are core components of ML infrastructure that can be effectively supported at every stage of any ML workflow, i.e.:

  • Model selection and model construction
  • Data ingestion and preparation
  • Visualisation and monitoring
  • Model testing
  • Model deployment
  • Evaluation and analysis
  • Model/experiment versioning and traceability
  • ML pipeline automation (pipelines can encapsulate part or all of the model development, training and inference process, from data preparation, to model training and inference, to data post-processing, monitoring, and evaluation)

Below we outline how the ACCESS-NRI could potentially support ML infrastructure for the benefit of the Australian research community. The NRI is well-positioned to take a leading role in building a connected community in Australia to progress advancements in ML for weather and climate applications, and stimulate collaboration across organisations. This is an opportune time given the rapidly advancing nature of the field. It is clear that ML will be a key part of the future of our weather and climate systems.

Technical Infrastructure

  • Support, extend and develop tools for use at every stage of the ML workflow to enable the community to
    • Use training pipelines and lower the effort for model training
    • Ingest, prepare and manipulate data required for training
    • Visualise and monitor at every stage of the ML workflow
    • Perform integration tests and error-checking at various stages of the workflow
    • Access evaluation tools to, for example, lower the effort to produce first-glance scorecards of performance
    • Ensure appropriate version control and traceability of model pipelines, weights and data pre- and post-preocessing
  • Maintain reference implementations of pre-trained ML models i.e., neural-earth system models (pure ML weather/climate models) and other ML models (e.g., for downscaling, climate driver index prediction, image segmentation, and object identification and tracking). Potentially also maintain user-trainable reference ML model implementations.
  • Support infrastructure frameworks required for implementing ML emulators into dynamical model parameterisations.
  • Provision of support and advice for scientists and project teams around planning the HPC hardware needed for ML – which can have a huge impact on performance and cost (e.g., use of GPUs)

Documentation

  • Documentation and user-guides of reference model implementations
  • Documentation and user-guides for supported tools and infrastructure
  • “Getting started” primer information and various guides for using ML
  • Collections of ML use-case examples
  • Collections of applicable papers
  • Documentation for computational and data scientists getting started with scientific ML; ditto documentation for physical scientists getting started with ML

Data

  • Support and promote key datasets used for ML training and application
  • Where appropriate, make input and output data transparent, open and accessible
  • Advice on appropriate use of data storage (both input and output)

Research support and leadership

  • Provide advice and guidance, e.g., on the application and use of various ML architectures for different applications, on common pitfalls or misuse of ML tools and methods, and on methods for the interpretation and improvement/refinement of ML models
  • Maintain a benchmarking web page for popular and high-profile ML models in Australia on scores and use cases of particular interest to the Australian region and community
  • Support the development of community ML weather and/or climate models

Community support and development

  • Support and instigate community events (e.g., hackathons, workshops)
  • Facilitate regular working group meetings to share information and results related to ML architectures, experimental design, evaluation, challenges and opportunities
  • Support training for new science and data science graduates/ECRs to enable them to blend ML and geosciences in their work (build the future workforce)
  • Promote career pathways for physical scientists adopting ML; promotion of career pathways for computational and data scientists considering a science pathway
  • Provide a platform (Hive) for communication and collaboration
  • Provide an HPC hardware environment (NCI project) to perform experiments of interest to the ACCESS-NRI working group community
3 Likes

While I applaud the ambitious nature of the document, I am concerned that the scope is so broad and all encompassing as to be of of little use in developing a framework.

For example, the generic use of the term ML is pretty unhelpful. ML encompasses everthing from Neural Networks, Random Forests, Gaussian state space models, entropic AI, Bayesian inference, Neural ODEs, … the list goes on.

Applications range from subgrid modelling, parameter estimation, prediction etc., however the application will determine the appropriate methodology and training data requirements.

The document seems to suggest that the ACCESS-NRI will train people in data science which, I would argue is not its role and doesn’t have the capacity or expertise to do so. While there are certainly fairly simple, straightforward to learn ML methods (random forests for example), many require considerable background understanding if one is to do more than simply run a pre-canned tool.

I would suggest a more carefully planned and staged approach to
1: identify and coordinate a community of practice - the data science community is large and diverse (UDASH, CSIRO Environment & Data61, Universities are all active in this space) with many of us already actively engaged in problems in the earth sciences. The data assimilation community is a natural ally in this space and actively engaging with the ML community.
2: Making readily searchable available training data would be a huge step for the wider community.
3: the bullet points around infrastructure support and HPC for ML practitioners would be highly useful, especially regarding GPU implementation.

The wider aspirational goals would naturally follow from having an engaged community with code base etc built over time.

4 Likes