ACCESS-rAM3 Release 1.0 Feedback

About

This topic is a catch-all location for feedback for the ACCESS-rAM3 Release 1.0.

Please reply to this topic if you have feedback on the ACCESS-rAM3 release. Feedback can be to point out problems encountered, or positive to highlight what worked well.

If your feedback is involved and will require specific help feel free to
make a topic on the ACCESS-Hive Forum.

If you’re not sure, reply here and your query can be moved to a separate topic if required.

Thanks NRI for the new release, congrats on the improvements implemented. I’ve just successfully completed a test of the latest branch access_rel_1 for the ancil and running suites.

One thing I hadn’t fully understood the implications was the new netcdf functionality (as I had been working from my own branch). I know this is a complex issue, and we asked for netcdf support, but I think there is scope to improve some of this. For example:

1. Variable name issue

Previously loading files with iris on inspection of an output file (e.g umnsaa_pvera) we could see almost all variable names as simple text (e.g. “air_temperature”).
Now on inspection we only get a list of stashIDs (e.g. “STASH_m01s03i236”) as variable names. This makes it difficult to navigate data, especially for new users.

2. Dimension issue

Previously when loading a cube with iris and converting to xarray, metadata would come across clearly (e.g. the array would be called “air_temperature”, with dimensions [“time”, “latitude”, “longitude”].
Now metadata is converted with non-standard names (e.g. time is “VT1HR”, latitude is “grid_latitude_t” and longitude is “grid_longitude_t”).
Perhaps there are good reasons not to simply rename everything as “time” if they are defined differently (e.g. instantaneous on the hour, hour average, or max in the hour), but this is a change to previous functionality, so should be documented. Personally, I would prefer “time” as a standard coordinate name, with the method detail included in attributes.

3. Time profile issue

Previously where multiple time profiles were defined for a single variable (e.g. by default for air temperature in STASHPACK1 there is VTS1 and VT1HR: for first model timestep and instantanous on the hour), iris would concatenate these into a single cube.
Now with multiple time profile for a particular variable, multiple variables are created (e.g. for air temperature there is “STASH_m01s03i236” and “STASH_m01s03i236_2”), making it more difficult again to navigate. In general it is the _2 ones that I want to use (instantaneous on the hour, rather than the first timestep output).

I know there are no quick solutions to the above, and the UM systems don’t make it easy. For example… I don’t know why the standard UM STASHPACK1 (for as long as I can remember) has set a single value output at the first timestep for each variable. In free-running mode this means every cycle has to have that first timestep removed (or there are two values at the same time). Perhaps the Met Office included this because of some verification (i.e. with RES). But we could consider removing those single timestep outputs, seeing as they cause issue with the new netcdf functionality.

I also know that designing output workflows (e.g. for AUS2200 or BARRA or ESM) takes a huge amount of development effort (so thank you). I don’t know whether the standard STASHPACK1 is useful for the ACCESS community (would need discussion), but I think future efforts should focus on adapting the work from AUS2200 to make a useful set of outputs commonly used by researchers (as opposed to the NWP validation STASHPACKS that come from the Met Office).

Perhaps agreeing on better output workflows and/or stashpacks could be part of an ongoing discussion at the Atmosphere working groups, and a breakout session at the upcoming ACCESS annual workshop?

Examples of what I’m describing is included below:

import iris
import xarray as xr
import matplotlib.pyplot as plt

xr.set_options(display_max_rows=50)
datapath = '/scratch/fy29/mjl561/cylc-run/u-dg768/share/cycle/20220226T0000Z/Lismore/d0198/RAL3P2/um'

#### 1. VARIABLE NAME ISSUE

# PREVIOUSLY with iris
cbs = iris.load(f'{datapath}/umnsaa_pvera000')
print(cbs)
 
'''
0: m01s01i202 / (unknown)              (time: 25; latitude: 450; longitude: 450)
1: Turbulent mixing height after boundary layer / (m) (time: 25; latitude: 450; longitude: 450)
2: m01s03i253 / (unknown)              (time: 24; latitude: 450; longitude: 450)
3: Cumulus capped boundary layer indicator / (1) (time: 24; latitude: 450; longitude: 450)
4: air_temperature / (K)               (time: 25; latitude: 450; longitude: 450)
5: atmosphere_boundary_layer_thickness / (m) (time: 8; latitude: 450; longitude: 450)
6: dew_point_temperature / (K)         (time: 25; latitude: 450; longitude: 450)
7: fog_area_fraction / (1)             (time: 25; latitude: 450; longitude: 450)
8: land_binary_mask / (1)              (latitude: 450; longitude: 450)
9: relative_humidity / (%)             (time: 25; latitude: 450; longitude: 450)
10: sea_ice_area_fraction / (1)         (time: 24; latitude: 450; longitude: 450)
11: surface_air_pressure / (Pa)         (time: 8; latitude: 450; longitude: 450)
12: surface_altitude / (m)              (latitude: 450; longitude: 450)
13: surface_downwelling_longwave_flux_in_air / (W m-2) (time: 24; latitude: 450; longitude: 450)
14: surface_downwelling_shortwave_flux_in_air / (W m-2) (time: 24; latitude: 450; longitude: 450)
15: surface_net_downward_longwave_flux / (W m-2) (time: 24; latitude: 450; longitude: 450)
16: surface_snow_amount / (kg m-2)      (time: 24; latitude: 450; longitude: 450)
17: surface_temperature / (K)           (time: 25; latitude: 450; longitude: 450)
18: surface_upward_latent_heat_flux / (W m-2) (time: 24; latitude: 450; longitude: 450)
19: surface_upward_sensible_heat_flux / (W m-2) (time: 24; latitude: 450; longitude: 450)
20: toa_incoming_shortwave_flux / (W m-2) (time: 24; latitude: 450; longitude: 450)
21: toa_outgoing_longwave_flux / (W m-2) (time: 24; latitude: 450; longitude: 450)
22: toa_outgoing_shortwave_flux / (W m-2) (time: 25; latitude: 450; longitude: 450)
23: toa_outgoing_shortwave_flux / (W m-2) (time: 24; latitude: 450; longitude: 450)
24: visibility_in_air / (m)             (time: 25; latitude: 450; longitude: 450)
25: wind_speed_of_gust / (m s-1)        (time: 24; latitude: 450; longitude: 450)
26: x_wind / (m s-1)                    (time: 25; latitude: 451; longitude: 450)
27: y_wind / (m s-1)                    (time: 25; latitude: 451; longitude: 450)
'''


# NOW with netcdf
ds = xr.open_dataset(f'{datapath}/nc/umnsaa_pvera000.nc')
print(ds)

'''
<xarray.Dataset> Size: 506MB
Dimensions:                     (grid_longitude_t: 450, grid_latitude_t: 450,
                                 bounds2: 2, VT1HR: 12, VTS0_0: 1, VT3HR: 4,
                                 VTS1: 1, VTS1_rad_diag: 1, VT1HR_rad_diag: 11,
                                 grid_longitude_uv: 450, grid_latitude_uv: 451,
                                 height_10m: 1, height_1_5m: 1, VT1HRMAX: 12)
Coordinates:
  * grid_longitude_t            (grid_longitude_t) float64 4kB 148.8 ... 157.7
    longitude_t                 (grid_latitude_t, grid_longitude_t) float64 2MB ...
  * grid_latitude_t             (grid_latitude_t) float64 4kB -32.95 ... -24.06
    latitude_t                  (grid_latitude_t, grid_longitude_t) float64 2MB ...
  * VT1HR                       (VT1HR) datetime64[ns] 96B 2022-02-26T01:00:0...
  * VTS0_0                      (VTS0_0) datetime64[ns] 8B 2022-02-26
  * VT3HR                       (VT3HR) datetime64[ns] 32B 2022-02-26T03:00:0...
  * VTS1                        (VTS1) datetime64[ns] 8B 2022-02-26T00:01:00
  * VTS1_rad_diag               (VTS1_rad_diag) datetime64[ns] 8B 2022-02-26T...
  * VT1HR_rad_diag              (VT1HR_rad_diag) datetime64[ns] 88B 2022-02-2...
  * grid_longitude_uv           (grid_longitude_uv) float64 4kB 148.8 ... 157.7
    longitude_uv                (grid_latitude_uv, grid_longitude_uv) float64 2MB ...
  * grid_latitude_uv            (grid_latitude_uv) float64 4kB -32.96 ... -24.05
    latitude_uv                 (grid_latitude_uv, grid_longitude_uv) float64 2MB ...
  * height_10m                  (height_10m) float64 8B 10.0
  * height_1_5m                 (height_1_5m) float64 8B 1.5
  * VT1HRMAX                    (VT1HRMAX) datetime64[ns] 96B 2022-02-26T01:0...
Dimensions without coordinates: bounds2
Data variables:
    rotated_latitude_longitude  |S1 1B ...
    grid_longitude_t_bounds     (grid_longitude_t, bounds2) float64 7kB ...
    grid_latitude_t_bounds      (grid_latitude_t, bounds2) float64 7kB ...
    STASH_m01s00i023            (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
    STASH_m01s00i024            (VTS0_0, grid_latitude_t, grid_longitude_t) float64 2MB ...
    STASH_m01s00i024_2          (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
    STASH_m01s00i025            (VT3HR, grid_latitude_t, grid_longitude_t) float64 6MB ...
    STASH_m01s00i030            (VTS0_0, grid_latitude_t, grid_longitude_t) int32 810kB ...
    STASH_m01s00i031            (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
    STASH_m01s00i033            (VTS0_0, grid_latitude_t, grid_longitude_t) float64 2MB ...
    STASH_m01s00i409            (VT3HR, grid_latitude_t, grid_longitude_t) float64 6MB ...
    STASH_m01s01i202            (VTS1, grid_latitude_t, grid_longitude_t) float64 2MB ...
    STASH_m01s01i202_2          (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
    STASH_m01s01i205            (VTS1, grid_latitude_t, grid_longitude_t) float64 2MB ...
    STASH_m01s01i205_2          (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
    STASH_m01s01i207_2          (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
    STASH_m01s01i208            (VTS1_rad_diag, grid_latitude_t, grid_longitude_t) float64 2MB ...
    STASH_m01s01i208_2          (VT1HR_rad_diag, grid_latitude_t, grid_longitude_t) float64 18MB ...
    STASH_m01s01i235            (VTS1_rad_diag, grid_latitude_t, grid_longitude_t) float64 2MB ...
    STASH_m01s01i235_2          (VT1HR_rad_diag, grid_latitude_t, grid_longitude_t) float64 18MB ...
    STASH_m01s02i201            (VTS1_rad_diag, grid_latitude_t, grid_longitude_t) float64 2MB ...
    STASH_m01s02i201_2          (VT1HR_rad_diag, grid_latitude_t, grid_longitude_t) float64 18MB ...
    STASH_m01s02i205            (VTS1_rad_diag, grid_latitude_t, grid_longitude_t) float64 2MB ...
    STASH_m01s02i205_2          (VT1HR_rad_diag, grid_latitude_t, grid_longitude_t) float64 18MB ...
    STASH_m01s02i207            (VTS1_rad_diag, grid_latitude_t, grid_longitude_t) float64 2MB ...
    STASH_m01s02i207_2          (VT1HR_rad_diag, grid_latitude_t, grid_longitude_t) float64 18MB ...
    STASH_m01s03i217            (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
    grid_longitude_uv_bounds    (grid_longitude_uv, bounds2) float64 7kB ...
    grid_latitude_uv_bounds     (grid_latitude_uv, bounds2) float64 7kB ...
    STASH_m01s03i225            (VTS1, height_10m, grid_latitude_uv, grid_longitude_uv) float64 2MB ...
    STASH_m01s03i225_2          (VT1HR, height_10m, grid_latitude_uv, grid_longitude_uv) float64 19MB ...
    STASH_m01s03i226            (VTS1, height_10m, grid_latitude_uv, grid_longitude_uv) float64 2MB ...
    STASH_m01s03i226_2          (VT1HR, height_10m, grid_latitude_uv, grid_longitude_uv) float64 19MB ...
    STASH_m01s03i234            (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
    STASH_m01s03i236            (VTS1, height_1_5m, grid_latitude_t, grid_longitude_t) float64 2MB ...
    STASH_m01s03i236_2          (VT1HR, height_1_5m, grid_latitude_t, grid_longitude_t) float64 19MB ...
    STASH_m01s03i245            (VTS1, height_1_5m, grid_latitude_t, grid_longitude_t) float64 2MB ...
    STASH_m01s03i245_2          (VT1HR, height_1_5m, grid_latitude_t, grid_longitude_t) float64 19MB ...
    STASH_m01s03i248            (VTS1, height_1_5m, grid_latitude_t, grid_longitude_t) float64 2MB ...
    STASH_m01s03i248_2          (VT1HR, height_1_5m, grid_latitude_t, grid_longitude_t) float64 19MB ...
    STASH_m01s03i250            (VTS1, height_1_5m, grid_latitude_t, grid_longitude_t) float64 2MB ...
    STASH_m01s03i250_2          (VT1HR, height_1_5m, grid_latitude_t, grid_longitude_t) float64 19MB ...
    STASH_m01s03i253            (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
    STASH_m01s03i281            (VTS1, height_1_5m, grid_latitude_t, grid_longitude_t) float64 2MB ...
    STASH_m01s03i281_2          (VT1HR, height_1_5m, grid_latitude_t, grid_longitude_t) float64 19MB ...
    STASH_m01s03i304            (VTS1, grid_latitude_t, grid_longitude_t) float64 2MB ...
    STASH_m01s03i304_2          (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
    STASH_m01s03i310            (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
    VT1HRMAX_bounds             (VT1HRMAX, bounds2) datetime64[ns] 192B ...
    STASH_m01s03i463            (VT1HRMAX, grid_latitude_t, grid_longitude_t) float64 19MB ...
Attributes:
    Conventions:  CF-1.6
    source:       Met Office Unified Model v13.0

'''        

##### 2. DEMENSION ISSUE #####
# PREVIOUSLY with iris

cb = iris.load_cube(f'{datapath}/umnsaa_pvera000', 'air_temperature')
da = xr.DataArray().from_iris(cb)
print(da)

'''
<xarray.DataArray 'air_temperature' (time: 25, latitude: 450, longitude: 450)> Size: 20MB
dask.array<filled, shape=(25, 450, 450), dtype=float32, chunksize=(1, 450, 450), chunktype=numpy.ndarray>
Coordinates:
  * time                     (time) datetime64[ns] 200B 2022-02-26T00:01:00 ....
  * latitude                 (latitude) float32 2kB -32.96 -32.94 ... -24.06
  * longitude                (longitude) float32 2kB 148.8 148.9 ... 157.7 157.7
    forecast_reference_time  datetime64[ns] 8B ...
    height                   float64 8B ...
    forecast_period          (time) timedelta64[ns] 200B ...
Attributes:
    standard_name:  air_temperature
    units:          K
    source:         Data from Met Office Unified Model
    um_version:     13.0
    STASH:          m01s03i236
'''

# NOW with netcdf
ds = xr.open_dataset(f'{datapath}/nc/umnsaa_pvera000.nc')
da = ds['STASH_m01s03i236_2'] # note the underscore because of two clashing time profiles
print(da)

'''
<xarray.DataArray 'STASH_m01s03i236_2' (VT1HR: 12, height_1_5m: 1,
                                        grid_latitude_t: 450,
                                        grid_longitude_t: 450)> Size: 19MB
[2430000 values with dtype=float64]
Coordinates:
  * grid_longitude_t  (grid_longitude_t) float64 4kB 148.8 148.9 ... 157.7 157.7
    longitude_t       (grid_latitude_t, grid_longitude_t) float64 2MB ...
  * grid_latitude_t   (grid_latitude_t) float64 4kB -32.95 -32.94 ... -24.06
    latitude_t        (grid_latitude_t, grid_longitude_t) float64 2MB ...
  * VT1HR             (VT1HR) datetime64[ns] 96B 2022-02-26T01:00:00 ... 2022...
  * height_1_5m       (height_1_5m) float64 8B 1.5
Attributes:
    long_name:          TEMPERATURE AT 1.5M
    standard_name:      air_temperature
    units:              K
    cell_methods:       VT1HR: point
    grid_mapping:       rotated_latitude_longitude
    um_version:         13.0
    um_stash_source:    m01s03i236
    packing_method:     quantization
    precision_measure:  binary
    precision_value:    -6
'''

##### 3. TIME PROFILE ISSUE #####

# PREVIOUSLY with iris, which concatenates the time profiles in a cycle with glob
cb = iris.load_cube(f'{datapath}/umnsaa_pvera*', 'air_temperature')

# NOW with netcdf, we need to know the stash name, and whether there is a clash in time profiles

ds1 = xr.open_dataset(f'{datapath}/nc/umnsaa_pvera000.nc')['STASH_m01s03i236_2']
ds2 = xr.open_dataset(f'{datapath}/nc/umnsaa_pvera012.nc')['STASH_m01s03i236_2']
ds = xr.concat([ds1, ds2], dim='VT1HR')

# and then to simplify, rename dimensions and drop the unimportant ones
ds = ds.rename({'VT1HR': 'time', 'grid_latitude_t': 'latitude', 'grid_longitude_t': 'longitude'})
ds = ds.drop_vars(['height_1_5m', 'longitude_t', 'latitude_t'])
2 Likes

Hi Mathew,
Thanks for the feedback on UM NetCDF. The NetCDF support is currently based on UM NetCDF as documented at UMDP C11. This uses a translation from STASH to NetCDF CF that was based on Metarelate, as documented in UM Trac ticket #3370. In that Trac ticket I comment on the fact that the mapping is out of date since ​Iris is now independent from Metarelate. This is one reason that we were asking for feedback on the NetCDF output. I will create a UM Trac ticket asking for the mappings in STASH_to_CF.txt to depend directly on IRIS.

This, of course will not directly address the variable name issue that you have reported. The problem is caused by the variable name being mapped to the STASH source um_stash_source rather than the standard_name. If you look at the output such as share/cycle/20220227T0000Z/Lismore/d1000/GAL9/um/umnsaa_pvera000.nc using ncdump -h you will see variables like:

	double STASH_m01s03i236_2(VT1HR, height_1_5m, grid_latitude_t, grid_longitude_t) ;
		STASH_m01s03i236_2:long_name = "TEMPERATURE AT 1.5M" ;
		STASH_m01s03i236_2:standard_name = "air_temperature" ;
		STASH_m01s03i236_2:units = "K" ;
		STASH_m01s03i236_2:coordinates = "longitude_t latitude_t" ;
		STASH_m01s03i236_2:cell_methods = "VT1HR: point" ;
		STASH_m01s03i236_2:grid_mapping = "rotated_latitude_longitude" ;
		STASH_m01s03i236_2:_FillValue = -1073741824. ;
		STASH_m01s03i236_2:um_version = "13.0" ;
		STASH_m01s03i236_2:um_stash_source = "m01s03i236" ;
		STASH_m01s03i236_2:packing_method = "quantization" ;
		STASH_m01s03i236_2:precision_measure = "binary" ;
		STASH_m01s03i236_2:precision_value = -6 ;

One problem with using the standard_name as the variable name is that it may not be unique in the file, so the variable name would need to be modified to distinguish variables. In this case, both STASH_m01s03i236 and STASH_m01s03i236_2 have standard_name "air_temperature". If standard_name was used instead, you might have (e.g.) variable names air_temperature and air_temperature_2 instead.

Perhaps you could suggest a better specification for the mapping from STASH fields to NetCDF CF variables than the one that is currently being used by UM NetCDF?

I would also be interested to know what happens when you use Iris instead of Xarray to load the NetCDF output files. Would that help?

In any case, should we also provide optional STASHpacks that revert to using the previous format?

1 Like