PyEarthTools Training Day

This post will hold the schedule for the ACCESS-NRI 2025 training day using PyEarthTools to introduce people to machine learning for climate and weather. It will be updated progressively as we get closer to the event.

QR code directing to the URL for this Hive Forum post ( PyEarthTools Training Day )

If you want to get started ahead of time, the tutorial will be based on these linked notebooks: https://github.com/ACCESS-Community-Hub/PyEarthTools/tree/develop/notebooks/tutorial/ENSO_Tutorial

For convenience, a copy of these tutorials is on disk at /scratch/nf33/pyearthtools_notebooks which should be accessible to logged-in users. Make a copy of these notebooks in a directory of your choosing (such as your home directory, or your user-specific directory under /scratch/nf33) and work from there.

You will need to specify the following information when launching a Jupyter session from https://are.nci.org.au/ .

Walltime (hours): 4

Queue: normalbw

Compute Size: large

Storage: gdata/dk92+gdata/rt52

Extra arguments:

Module directories: /g/data/dk92/apps/Modules/modulefiles

Modules: pet/2025.08

Jobfs size: 100GB

1.45 - 3.15: Connecting to data with PET; problem overview, XGBoost example

3.30 - 5pm: Neural networks with PET

Session 1 - Setup, data, xgboost

1.45 – 2.15pm: Get everyone online, connected and set up, join projects. Run through the basic plotting notebook as proof of success

2.15 – 2.35: Introduction to ENSO, why is it important, what prior ML work exists

2.35 – 2.50: Introduction to data accessors in PET

2.50 – 3.15: XGBoost model of ENSO with PET:

  • Data overview
  • Data preparation
  • Test/train/validate
  • Connection to XGBoost
  • Introduction to PyEarthTools pipeline features
  • If time, how to add new features using additional data accessors
    • Try changing the bounding box
    • Try adding an additional data source (e.g. satellite or another model variable)
    • Other next steps from the notebooks as time allows

3.15 – 3.30: Afternoon tea and tech support

Session 2 – Neural Networks and Model Evaluation

3.30 – 4pm: Introduction to neural networks with PET

  • Separating the input pipeline from the target pipeline
  • Preserving geospatial structures
  • Sequence-to-sequence pipelines
  • Normalising (z-scaling) inside a pipeline across dimensions
  • Presenting data to PyTorch

4pm-4.30pm: Model training and evaluation

  • Training larger models on GPUs
  • Exploring loss functions and training strategies

4pm – 4.30pm: Neural network architectures for ENSO

  • Introduction to CNNs and ResNet
3 Likes