The FPWG welcomes your feedback on this document. Please provide comments/input by using the reply function, or editing directly in the google-doc version (in track-changes)
How ACCESS NRI could support ML/AI
Machine learning (ML) infrastructure is the software and hardware foundation required for the development and deployment of ML models.
Although ML models and infrastructure implementations differ widely depending on the application, there are core components of ML infrastructure that can be effectively supported at every stage of any ML workflow, i.e.:
- Model selection and model construction
- Data ingestion and preparation
- Visualisation and monitoring
- Model testing
- Model deployment
- Evaluation and analysis
- Model/experiment versioning and traceability
- ML pipeline automation (pipelines can encapsulate part or all of the model development, training and inference process, from data preparation, to model training and inference, to data post-processing, monitoring, and evaluation)
Below we outline how the ACCESS-NRI could potentially support ML infrastructure for the benefit of the Australian research community. The NRI is well-positioned to take a leading role in building a connected community in Australia to progress advancements in ML for weather and climate applications, and stimulate collaboration across organisations. This is an opportune time given the rapidly advancing nature of the field. It is clear that ML will be a key part of the future of our weather and climate systems.
Technical Infrastructure
- Support, extend and develop tools for use at every stage of the ML workflow to enable the community to
- Use training pipelines and lower the effort for model training
- Ingest, prepare and manipulate data required for training
- Visualise and monitor at every stage of the ML workflow
- Perform integration tests and error-checking at various stages of the workflow
- Access evaluation tools to, for example, lower the effort to produce first-glance scorecards of performance
- Ensure appropriate version control and traceability of model pipelines, weights and data pre- and post-preocessing
- Maintain reference implementations of pre-trained ML models i.e., neural-earth system models (pure ML weather/climate models) and other ML models (e.g., for downscaling, climate driver index prediction, image segmentation, and object identification and tracking). Potentially also maintain user-trainable reference ML model implementations.
- Support infrastructure frameworks required for implementing ML emulators into dynamical model parameterisations.
- Provision of support and advice for scientists and project teams around planning the HPC hardware needed for ML – which can have a huge impact on performance and cost (e.g., use of GPUs)
Documentation
- Documentation and user-guides of reference model implementations
- Documentation and user-guides for supported tools and infrastructure
- “Getting started” primer information and various guides for using ML
- Collections of ML use-case examples
- Collections of applicable papers
- Documentation for computational and data scientists getting started with scientific ML; ditto documentation for physical scientists getting started with ML
Data
- Support and promote key datasets used for ML training and application
- Where appropriate, make input and output data transparent, open and accessible
- Advice on appropriate use of data storage (both input and output)
Research support and leadership
- Provide advice and guidance, e.g., on the application and use of various ML architectures for different applications, on common pitfalls or misuse of ML tools and methods, and on methods for the interpretation and improvement/refinement of ML models
- Maintain a benchmarking web page for popular and high-profile ML models in Australia on scores and use cases of particular interest to the Australian region and community
- Support the development of community ML weather and/or climate models
Community support and development
- Support and instigate community events (e.g., hackathons, workshops)
- Facilitate regular working group meetings to share information and results related to ML architectures, experimental design, evaluation, challenges and opportunities
- Support training for new science and data science graduates/ECRs to enable them to blend ML and geosciences in their work (build the future workforce)
- Promote career pathways for physical scientists adopting ML; promotion of career pathways for computational and data scientists considering a science pathway
- Provide a platform (Hive) for communication and collaboration
- Provide an HPC hardware environment (NCI project) to perform experiments of interest to the ACCESS-NRI working group community