Thanks NRI for the new release, congrats on the improvements implemented. I’ve just successfully completed a test of the latest branch access_rel_1
for the ancil and running suites.
One thing I hadn’t fully understood the implications was the new netcdf functionality (as I had been working from my own branch). I know this is a complex issue, and we asked for netcdf support, but I think there is scope to improve some of this. For example:
1. Variable name issue
Previously loading files with iris on inspection of an output file (e.g umnsaa_pvera) we could see almost all variable names as simple text (e.g. “air_temperature”).
Now on inspection we only get a list of stashIDs (e.g. “STASH_m01s03i236”) as variable names. This makes it difficult to navigate data, especially for new users.
2. Dimension issue
Previously when loading a cube with iris and converting to xarray, metadata would come across clearly (e.g. the array would be called “air_temperature”, with dimensions [“time”, “latitude”, “longitude”].
Now metadata is converted with non-standard names (e.g. time is “VT1HR”, latitude is “grid_latitude_t” and longitude is “grid_longitude_t”).
Perhaps there are good reasons not to simply rename everything as “time” if they are defined differently (e.g. instantaneous on the hour, hour average, or max in the hour), but this is a change to previous functionality, so should be documented. Personally, I would prefer “time” as a standard coordinate name, with the method detail included in attributes.
3. Time profile issue
Previously where multiple time profiles were defined for a single variable (e.g. by default for air temperature in STASHPACK1 there is VTS1 and VT1HR: for first model timestep and instantanous on the hour), iris would concatenate these into a single cube.
Now with multiple time profile for a particular variable, multiple variables are created (e.g. for air temperature there is “STASH_m01s03i236” and “STASH_m01s03i236_2”), making it more difficult again to navigate. In general it is the _2 ones that I want to use (instantaneous on the hour, rather than the first timestep output).
I know there are no quick solutions to the above, and the UM systems don’t make it easy. For example… I don’t know why the standard UM STASHPACK1 (for as long as I can remember) has set a single value output at the first timestep for each variable. In free-running mode this means every cycle has to have that first timestep removed (or there are two values at the same time). Perhaps the Met Office included this because of some verification (i.e. with RES). But we could consider removing those single timestep outputs, seeing as they cause issue with the new netcdf functionality.
I also know that designing output workflows (e.g. for AUS2200 or BARRA or ESM) takes a huge amount of development effort (so thank you). I don’t know whether the standard STASHPACK1 is useful for the ACCESS community (would need discussion), but I think future efforts should focus on adapting the work from AUS2200 to make a useful set of outputs commonly used by researchers (as opposed to the NWP validation STASHPACKS that come from the Met Office).
Perhaps agreeing on better output workflows and/or stashpacks could be part of an ongoing discussion at the Atmosphere working groups, and a breakout session at the upcoming ACCESS annual workshop?
Examples of what I’m describing is included below:
import iris
import xarray as xr
import matplotlib.pyplot as plt
xr.set_options(display_max_rows=50)
datapath = '/scratch/fy29/mjl561/cylc-run/u-dg768/share/cycle/20220226T0000Z/Lismore/d0198/RAL3P2/um'
#### 1. VARIABLE NAME ISSUE
# PREVIOUSLY with iris
cbs = iris.load(f'{datapath}/umnsaa_pvera000')
print(cbs)
'''
0: m01s01i202 / (unknown) (time: 25; latitude: 450; longitude: 450)
1: Turbulent mixing height after boundary layer / (m) (time: 25; latitude: 450; longitude: 450)
2: m01s03i253 / (unknown) (time: 24; latitude: 450; longitude: 450)
3: Cumulus capped boundary layer indicator / (1) (time: 24; latitude: 450; longitude: 450)
4: air_temperature / (K) (time: 25; latitude: 450; longitude: 450)
5: atmosphere_boundary_layer_thickness / (m) (time: 8; latitude: 450; longitude: 450)
6: dew_point_temperature / (K) (time: 25; latitude: 450; longitude: 450)
7: fog_area_fraction / (1) (time: 25; latitude: 450; longitude: 450)
8: land_binary_mask / (1) (latitude: 450; longitude: 450)
9: relative_humidity / (%) (time: 25; latitude: 450; longitude: 450)
10: sea_ice_area_fraction / (1) (time: 24; latitude: 450; longitude: 450)
11: surface_air_pressure / (Pa) (time: 8; latitude: 450; longitude: 450)
12: surface_altitude / (m) (latitude: 450; longitude: 450)
13: surface_downwelling_longwave_flux_in_air / (W m-2) (time: 24; latitude: 450; longitude: 450)
14: surface_downwelling_shortwave_flux_in_air / (W m-2) (time: 24; latitude: 450; longitude: 450)
15: surface_net_downward_longwave_flux / (W m-2) (time: 24; latitude: 450; longitude: 450)
16: surface_snow_amount / (kg m-2) (time: 24; latitude: 450; longitude: 450)
17: surface_temperature / (K) (time: 25; latitude: 450; longitude: 450)
18: surface_upward_latent_heat_flux / (W m-2) (time: 24; latitude: 450; longitude: 450)
19: surface_upward_sensible_heat_flux / (W m-2) (time: 24; latitude: 450; longitude: 450)
20: toa_incoming_shortwave_flux / (W m-2) (time: 24; latitude: 450; longitude: 450)
21: toa_outgoing_longwave_flux / (W m-2) (time: 24; latitude: 450; longitude: 450)
22: toa_outgoing_shortwave_flux / (W m-2) (time: 25; latitude: 450; longitude: 450)
23: toa_outgoing_shortwave_flux / (W m-2) (time: 24; latitude: 450; longitude: 450)
24: visibility_in_air / (m) (time: 25; latitude: 450; longitude: 450)
25: wind_speed_of_gust / (m s-1) (time: 24; latitude: 450; longitude: 450)
26: x_wind / (m s-1) (time: 25; latitude: 451; longitude: 450)
27: y_wind / (m s-1) (time: 25; latitude: 451; longitude: 450)
'''
# NOW with netcdf
ds = xr.open_dataset(f'{datapath}/nc/umnsaa_pvera000.nc')
print(ds)
'''
<xarray.Dataset> Size: 506MB
Dimensions: (grid_longitude_t: 450, grid_latitude_t: 450,
bounds2: 2, VT1HR: 12, VTS0_0: 1, VT3HR: 4,
VTS1: 1, VTS1_rad_diag: 1, VT1HR_rad_diag: 11,
grid_longitude_uv: 450, grid_latitude_uv: 451,
height_10m: 1, height_1_5m: 1, VT1HRMAX: 12)
Coordinates:
* grid_longitude_t (grid_longitude_t) float64 4kB 148.8 ... 157.7
longitude_t (grid_latitude_t, grid_longitude_t) float64 2MB ...
* grid_latitude_t (grid_latitude_t) float64 4kB -32.95 ... -24.06
latitude_t (grid_latitude_t, grid_longitude_t) float64 2MB ...
* VT1HR (VT1HR) datetime64[ns] 96B 2022-02-26T01:00:0...
* VTS0_0 (VTS0_0) datetime64[ns] 8B 2022-02-26
* VT3HR (VT3HR) datetime64[ns] 32B 2022-02-26T03:00:0...
* VTS1 (VTS1) datetime64[ns] 8B 2022-02-26T00:01:00
* VTS1_rad_diag (VTS1_rad_diag) datetime64[ns] 8B 2022-02-26T...
* VT1HR_rad_diag (VT1HR_rad_diag) datetime64[ns] 88B 2022-02-2...
* grid_longitude_uv (grid_longitude_uv) float64 4kB 148.8 ... 157.7
longitude_uv (grid_latitude_uv, grid_longitude_uv) float64 2MB ...
* grid_latitude_uv (grid_latitude_uv) float64 4kB -32.96 ... -24.05
latitude_uv (grid_latitude_uv, grid_longitude_uv) float64 2MB ...
* height_10m (height_10m) float64 8B 10.0
* height_1_5m (height_1_5m) float64 8B 1.5
* VT1HRMAX (VT1HRMAX) datetime64[ns] 96B 2022-02-26T01:0...
Dimensions without coordinates: bounds2
Data variables:
rotated_latitude_longitude |S1 1B ...
grid_longitude_t_bounds (grid_longitude_t, bounds2) float64 7kB ...
grid_latitude_t_bounds (grid_latitude_t, bounds2) float64 7kB ...
STASH_m01s00i023 (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
STASH_m01s00i024 (VTS0_0, grid_latitude_t, grid_longitude_t) float64 2MB ...
STASH_m01s00i024_2 (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
STASH_m01s00i025 (VT3HR, grid_latitude_t, grid_longitude_t) float64 6MB ...
STASH_m01s00i030 (VTS0_0, grid_latitude_t, grid_longitude_t) int32 810kB ...
STASH_m01s00i031 (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
STASH_m01s00i033 (VTS0_0, grid_latitude_t, grid_longitude_t) float64 2MB ...
STASH_m01s00i409 (VT3HR, grid_latitude_t, grid_longitude_t) float64 6MB ...
STASH_m01s01i202 (VTS1, grid_latitude_t, grid_longitude_t) float64 2MB ...
STASH_m01s01i202_2 (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
STASH_m01s01i205 (VTS1, grid_latitude_t, grid_longitude_t) float64 2MB ...
STASH_m01s01i205_2 (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
STASH_m01s01i207_2 (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
STASH_m01s01i208 (VTS1_rad_diag, grid_latitude_t, grid_longitude_t) float64 2MB ...
STASH_m01s01i208_2 (VT1HR_rad_diag, grid_latitude_t, grid_longitude_t) float64 18MB ...
STASH_m01s01i235 (VTS1_rad_diag, grid_latitude_t, grid_longitude_t) float64 2MB ...
STASH_m01s01i235_2 (VT1HR_rad_diag, grid_latitude_t, grid_longitude_t) float64 18MB ...
STASH_m01s02i201 (VTS1_rad_diag, grid_latitude_t, grid_longitude_t) float64 2MB ...
STASH_m01s02i201_2 (VT1HR_rad_diag, grid_latitude_t, grid_longitude_t) float64 18MB ...
STASH_m01s02i205 (VTS1_rad_diag, grid_latitude_t, grid_longitude_t) float64 2MB ...
STASH_m01s02i205_2 (VT1HR_rad_diag, grid_latitude_t, grid_longitude_t) float64 18MB ...
STASH_m01s02i207 (VTS1_rad_diag, grid_latitude_t, grid_longitude_t) float64 2MB ...
STASH_m01s02i207_2 (VT1HR_rad_diag, grid_latitude_t, grid_longitude_t) float64 18MB ...
STASH_m01s03i217 (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
grid_longitude_uv_bounds (grid_longitude_uv, bounds2) float64 7kB ...
grid_latitude_uv_bounds (grid_latitude_uv, bounds2) float64 7kB ...
STASH_m01s03i225 (VTS1, height_10m, grid_latitude_uv, grid_longitude_uv) float64 2MB ...
STASH_m01s03i225_2 (VT1HR, height_10m, grid_latitude_uv, grid_longitude_uv) float64 19MB ...
STASH_m01s03i226 (VTS1, height_10m, grid_latitude_uv, grid_longitude_uv) float64 2MB ...
STASH_m01s03i226_2 (VT1HR, height_10m, grid_latitude_uv, grid_longitude_uv) float64 19MB ...
STASH_m01s03i234 (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
STASH_m01s03i236 (VTS1, height_1_5m, grid_latitude_t, grid_longitude_t) float64 2MB ...
STASH_m01s03i236_2 (VT1HR, height_1_5m, grid_latitude_t, grid_longitude_t) float64 19MB ...
STASH_m01s03i245 (VTS1, height_1_5m, grid_latitude_t, grid_longitude_t) float64 2MB ...
STASH_m01s03i245_2 (VT1HR, height_1_5m, grid_latitude_t, grid_longitude_t) float64 19MB ...
STASH_m01s03i248 (VTS1, height_1_5m, grid_latitude_t, grid_longitude_t) float64 2MB ...
STASH_m01s03i248_2 (VT1HR, height_1_5m, grid_latitude_t, grid_longitude_t) float64 19MB ...
STASH_m01s03i250 (VTS1, height_1_5m, grid_latitude_t, grid_longitude_t) float64 2MB ...
STASH_m01s03i250_2 (VT1HR, height_1_5m, grid_latitude_t, grid_longitude_t) float64 19MB ...
STASH_m01s03i253 (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
STASH_m01s03i281 (VTS1, height_1_5m, grid_latitude_t, grid_longitude_t) float64 2MB ...
STASH_m01s03i281_2 (VT1HR, height_1_5m, grid_latitude_t, grid_longitude_t) float64 19MB ...
STASH_m01s03i304 (VTS1, grid_latitude_t, grid_longitude_t) float64 2MB ...
STASH_m01s03i304_2 (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
STASH_m01s03i310 (VT1HR, grid_latitude_t, grid_longitude_t) float64 19MB ...
VT1HRMAX_bounds (VT1HRMAX, bounds2) datetime64[ns] 192B ...
STASH_m01s03i463 (VT1HRMAX, grid_latitude_t, grid_longitude_t) float64 19MB ...
Attributes:
Conventions: CF-1.6
source: Met Office Unified Model v13.0
'''
##### 2. DEMENSION ISSUE #####
# PREVIOUSLY with iris
cb = iris.load_cube(f'{datapath}/umnsaa_pvera000', 'air_temperature')
da = xr.DataArray().from_iris(cb)
print(da)
'''
<xarray.DataArray 'air_temperature' (time: 25, latitude: 450, longitude: 450)> Size: 20MB
dask.array<filled, shape=(25, 450, 450), dtype=float32, chunksize=(1, 450, 450), chunktype=numpy.ndarray>
Coordinates:
* time (time) datetime64[ns] 200B 2022-02-26T00:01:00 ....
* latitude (latitude) float32 2kB -32.96 -32.94 ... -24.06
* longitude (longitude) float32 2kB 148.8 148.9 ... 157.7 157.7
forecast_reference_time datetime64[ns] 8B ...
height float64 8B ...
forecast_period (time) timedelta64[ns] 200B ...
Attributes:
standard_name: air_temperature
units: K
source: Data from Met Office Unified Model
um_version: 13.0
STASH: m01s03i236
'''
# NOW with netcdf
ds = xr.open_dataset(f'{datapath}/nc/umnsaa_pvera000.nc')
da = ds['STASH_m01s03i236_2'] # note the underscore because of two clashing time profiles
print(da)
'''
<xarray.DataArray 'STASH_m01s03i236_2' (VT1HR: 12, height_1_5m: 1,
grid_latitude_t: 450,
grid_longitude_t: 450)> Size: 19MB
[2430000 values with dtype=float64]
Coordinates:
* grid_longitude_t (grid_longitude_t) float64 4kB 148.8 148.9 ... 157.7 157.7
longitude_t (grid_latitude_t, grid_longitude_t) float64 2MB ...
* grid_latitude_t (grid_latitude_t) float64 4kB -32.95 -32.94 ... -24.06
latitude_t (grid_latitude_t, grid_longitude_t) float64 2MB ...
* VT1HR (VT1HR) datetime64[ns] 96B 2022-02-26T01:00:00 ... 2022...
* height_1_5m (height_1_5m) float64 8B 1.5
Attributes:
long_name: TEMPERATURE AT 1.5M
standard_name: air_temperature
units: K
cell_methods: VT1HR: point
grid_mapping: rotated_latitude_longitude
um_version: 13.0
um_stash_source: m01s03i236
packing_method: quantization
precision_measure: binary
precision_value: -6
'''
##### 3. TIME PROFILE ISSUE #####
# PREVIOUSLY with iris, which concatenates the time profiles in a cycle with glob
cb = iris.load_cube(f'{datapath}/umnsaa_pvera*', 'air_temperature')
# NOW with netcdf, we need to know the stash name, and whether there is a clash in time profiles
ds1 = xr.open_dataset(f'{datapath}/nc/umnsaa_pvera000.nc')['STASH_m01s03i236_2']
ds2 = xr.open_dataset(f'{datapath}/nc/umnsaa_pvera012.nc')['STASH_m01s03i236_2']
ds = xr.concat([ds1, ds2], dim='VT1HR')
# and then to simplify, rename dimensions and drop the unimportant ones
ds = ds.rename({'VT1HR': 'time', 'grid_latitude_t': 'latitude', 'grid_longitude_t': 'longitude'})
ds = ds.drop_vars(['height_1_5m', 'longitude_t', 'latitude_t'])