Reference Datasets FY23-24: Atmosphere WG

Introduction

ACCESS-NRI would like to know the needs of the Working Groups for Reference Datasets. We would like to know what datasets of significance for the working group you would like to have hosted in a published data collection at NCI within the financial year 2023/2024. We need to create a list of likely datasets that will be ready within this timeframe so we can start the data management process for these datasets with you and NCI.

What is a Reference Dataset?

A Reference Dataset is data that is of significance to the community and will be used by a range of users. Possible examples:

  • Dataset name
  • Dataset URL
  • Contact to liaise with publishing data (name and organisation)
  • Dataset details, in particular to demonstrate importance to the community, e.g. temporal and spatial coverage, nature of dataset (observations, reanalysis …), static or regularly updated
  • Current required storage in GB, projection for next year if not a static dataset

What do we need from you?

Proposing datasets

If you have a dataset you think

  1. Can be considered as a Reference Dataset
  2. Will be ready for publication before June 2024

Please reply to this topic with:

  • Dataset name
  • Contact to liaise with publishing data (name and organisation)
  • Dataset details to show its importance to the community
  • Required storage in GB

Voting on datasets

This topic has post voting enabled. You can make short comments on replies but you can also vote on replies. This is an experiment to crowd-source information about which datasets you value.

Up-vote datasets that are important to your work, or you think are important for Atmospheric modelling at NCI. You can down-vote datasets you think should not be included. Feel free to comment as well to provide context on why you voted a certain way.

Keep up to date

If you are a member of the Atmosphere Working Group you should have received this topic as an email, as we have changed the defaults so Working Group members always get notified of new topics in the Working Group category. This is to ensure you don’t miss important information.

Consider watching this topic if you want to stay updated on what is happening with datasets for the working group.

Summary of proposed Reference Datasets

Below is a table that I will be update summarising datasets proposed by the Atmosphere Working Group:

Dataset Proposer Contact Storage (in GB)
SST/SIC Ancillary files Sonya Fiddes ---- ~5 GB, grow / year
ERA5, processed for UM nudging Matt Woodhouse ---- 11,200 GB full dataset at N96 (1940-2022), grow 135 GB / year
BARRA reanalysis dataset Chun-Hsu Su ----
GPM-IMERG precipitation dataset Paola Petrelli, CLEX/UTAS ---- ~50TB for periods related to AUS2200 sims to date

Dataset name: SST/SIC Ancillary files

Contact to liaise with publishing data (name and organisation): Sonya Fiddes, UTAS
Dataset details to show its importance to the community: This data is the sea ice and sea surface temperature ancillary files used by ACCESS-AM2 runs (AMIP style config). It starts in 1870, and is based on the Hurret et al. 2008 merged data set: Merged Hadley-NOAA/OI Sea Surface Temperature & Sea-Ice Concentration (Hurrell et al, 2008) | Climate Data Guide, and is updated routinely.

Once downloaded, it is then converted to the appropriate grid with the ACCESS land-sea mask using this code: https://code.metoffice.gov.uk/trac/ancil/wiki/CMIP6/ForcingData/SstSeaIce. I have been updating every so often (currently out to 2020, so an update is needed).
Required storage in GB: Approx 5 GB, but will incrementally increase each year.
Other notes This data needs a better home than my p66 folder, and to be maintained so it can be updated once a year (the code requires a full year from memory).

2 Likes

Dataset name
ERA5, processed for UM nudging

Contact to liaise with publishing data (name and organisation)
Matt Woodhouse, CSIRO

Dataset details to show its importance to the community
A subset of the ERA5 reanalysis (specifically U, V and theta variables) is needed as input data to enable atmospheric nudging in ACCESS-CM2/AM2, and subsequent / related configurations. Nudging is particularly useful in atmospheric composition simulations, and can also be applied in other studies.

Required storage in GB
The full dataset from 1940 to 2022 would occupy 11,200 GB at N96, and would grow at 135 GB per year. It may be possible to reduce this requirement by a factor of four by storing the data at N48. N96 is preferred however.

Support required
Generating the full dataset would require some scripting input from NRI. We already have the basic tools to gather and format the data, but they could do with some automation / improvement.

Ongoing support to generate new years of data as they become available would be valuable.

1 Like

Dataset name:
BARRA reanalysis dataset
Contact to liaise with publishing data:
Chun-Hsu Su, Bureau of Meteorology
Dataset details to show its importance to the community:
High-resolution regional reanalysis dataset that provide invaluable information for analysing and understanding Australian weather and climate. It is also frequently used to initialise regional weather models such as AUS2200.
Other notes:
BARRA version 1 – already published at NCI data collection under cj37, in the NCI Data Catalogue - NCI Data Catalogue. Compared to BARRA version 2, some of the BARRA v1 data, particularly 1.5 km downscaled data is unique to BARRA v1.
BARRA version 2 – it will be published at NCI data collection under ob53. The storage has been covered by Bureau’s projects for the next few years. This replaces BARRA v1 for the regional-scale (12km) reanalysis, and supplements BARRA v1 by providing the downscaled km-scale (4.4km) reanalysis over whole of Australia. The regional-scale reanalysis BARRA-R2 will be published well before June 2024, and some of km-scale data BARRA-C2 will be published before then.

1 Like

Dataset name :
GPM-IMERG precipitation dataset
Contact to liaise with publishing data :
Paola Petrelli, CLEX/UTAS
Dataset details to show its importance to the community :
The Integrated Multi-satellitE Retrievals for GPM (IMERG) algorithm combines information from the GPM satellite constellation to estimate precipitation over the majority of the Earth’s surface. It is a gridded dataset with a spatiotemporal resolution of 0.1 degree and 30 minutes. The dataset is widely used for surface precipitation monitoring and analysis, as well as weather and climate model evaluation.
Required storage:
Around 50TB for periods related to AUS2200 simulations to date.
Other notes :
GPM-IMERG V07 (the new version) has just been released in the last couple of weeks. V07 goes up to present date from 2000. CLEX CMS team is in the process of updating the archived data with the new version under project ia39.

1 Like

Dataset name :
Australian Gridded Climate Data (AGCD)
Proposer :
Yi Huang, UniMelb
Dataset details to show its importance to the community :
Australian Gridded Climate Data (AGCD) is the Bureau of Meteorology’s official dataset for monthly gridded rainfall analysis. AGCD combines available rainfall data, with state-of-the-art statistical modelling and the latest in scientific techniques to provide accurate information on monthly, seasonal and annual rainfall conditions across the country. The dataset is widely used in the Australian community for surface precipitation monitoring and analysis, as well as weather and climate model evaluation.
Required storage :
Around 283G from 1900 to 2022.
Other notes :
AGCD is published at NCI under zv2 project, covering the period from 1900 up to 2022. This include both v1 (i.e. AWAP) and v2 (new higher resolution 0.01 deg product for monthly precipitation).

1 Like

Dataset name :
Australian Operational Weather Radar dataset
Proposers :
Hooman Ayat, Claire Vincent and Yi Huang, UniMelb
Dataset details to show its importance to the community :
The dataset is obtained from 76 radars across Australia with polarimetric radars included, and many sites have more than 20 years of data. In addition to the level 1 radar variables, level 2 data provides several retrievals including precipitation rate estimate, convective/stratiform classification, hail severity estimate and echo top heights with a spatial resolution of 1 km out to a range of 150 km from the radar location.
Required storage :
Whole project (including level 1&2 data for 76 weather radars across the country) is around 200TB.
Other notes :
Bureau’s precipitation fields (merged gauge+radar) product is published under rq0. Joshua Soderholm at the Bureau is the contact person of this product. Darwin CPOL data is published in hj10. The contact person is Valentin Louf / Alain Protat at the Bureau.

1 Like

Dataset name :
Bureau of Meteorology Weather Station dataset
Proposers :
Claire Vincent and Yi Huang, UniMelb
Dataset details to show its importance to the community :
Bureau’s weather stations (including AWS) record a variety of weather phenomena, including temperature, humidity, rainfall, pressure, sunshine, wind, cloud and visibility. The monthly, daily, 3 hourly, half hourly, and minute frequency options represent the most typical reporting schedules within broader reporting ranges. The sub-daily data record, which are currently not publicly available, are particularly useful for studying hazardous and extreme weather events from regional to local scales, and for evaluating high-resolution regional models.
Other notes :
All weather data recorded at a station are stored in the Bureau’s Australian Data Archive for Meteorology (ADAM).

1 Like

Dataset name :
Global Radiosonde Data (IGRA)
Proposers :
Chun-Hsu Su, Bureau of Meteorology
Dataset details to show its importance to the community :
These are quality controlled observations that have been subsampled or regridded to standard pressure levels. The latter reason means that the vertical resolution of the data can be far coarser than what is available from the Bureau’s operational observation archive. The standard-pressure data may be enough for most applications or not.
Required storage :
The data is around 30G for global data, going back to 1905 and updated daily, and well documented from NCEI-NOAA.

1 Like

Dataset name :
Himawari-8/9 geostationary satellite dataset
Proposers :
Yi Huang, UniMelb
Dataset details to show its importance to the community :
This dataset provides Brightness Temperature values for the Full Disk (FLDK) observations of Himawari-AHI. The data are provided for each IR band at the native spatial resolution, including band 07 (3.9um), band 08 (6.2um), band 09 (6.9um), band 10 (7.3um), band 11 (8.6um), band 12 (9.6 um), band 13 (10.4um), band 14 (11.2um), band 15 (12.4um) and band 16 (13.3um). The high-resolution Brightness Temperature data are particularly useful for evaluating simulated clouds and convection, including their physical properties.
Other notes :
The dataset is already published at NCI under project ‘ra22’ and is maintained by the Bureau of Meteorology. Contact: SI-Research-SS@bom.gov.au Satellite Science Team, Research, Bureau of Meteorology.