Calculating pairwise distances between points

lidefi87 · 21 April 2023 02:19

Hello everyone,

I am posting here a question that I originally asked in slack.

In short, I need to calculate the distance from each grid cell to the sea ice edge (where sea ice concentration is < 10%). My map would look something like the figure below.

Does anyone have any suggestions on how to do this calculation efficiently? @navidcy suggested that I calculate the Haversine distance for every pixel. Any suggestions would be appreciated.

navidcy · 21 April 2023 04:09

I’d also suggest you use a different colormap

Aidan · 23 April 2023 23:35

I’d second @edoddridge’s suggestion to use the geopy.distance function.

How are you doing this calculation? Are you using some sort of edge detection for the ice-edge points? Is it time consuming? Do you have to do it repeatedly for multiple time steps?

navidcy · 23 April 2023 23:39

from @Josue Martinez Moreno

If you have a regular grid, you can probably use:
scipy.ndimage.distance_transform_edt — SciPy v1.11.4 Manual
Otherwise, you may need to multiply this mask by the grid scale factors, but in that case it may be more complicated, and I’m not sure you will get the same results

An example of that is here:
Image transforms — Introduction to Bioimage Analysis

navidcy · 23 April 2023 23:41

@lidefi87, should we change the title to “calculating pair-wise distances between points on the globe” or something that summarizes the heart of the matter and doesn’t sound sea-ice specific since the actual subject we discuss is more generic than that

navidcy · 24 April 2023 00:11

I did it

So geopy.distance should have worked (did it?). But were you able to parallelize this via xarray @lidefi87?

lidefi87 · 24 April 2023 07:34

Hello everyone and thanks for your input. I decided to go with @edoddridge’s suggestion for the distance calculation.

So far, I have been able to calculate the sea ice edge, and I can also calculate the distance to the closest sea ice edge pixel. The problem is that I am stuck with parallelising the calculation of the distance to the sea ice edge for every pixel in my data frame.

I always get stuck when using the xarray.apply_ufunc function. I can never seem to get this right. So, I decided to use the np.apply_along_axis function, which is also failing. I tried mapping the function along axis 0 and 1, and neither work.

I have a reproducible notebook here: Crabeaters_data_cleaning/IceEdge.ipynb at main · lidefi87/Crabeaters_data_cleaning · GitHub. If anyone has any ideas on how to apply the calculation of distance to nearest ice edge pixel, it would be amazing.

Ideally, I will only calculate the distance from pixels containing sea ice to the sea ice edge, but at this stage, I do not care if distances are calculated for the entire data array. I will apply a mask to keep only the values I need.

Also, since this exercise has been more complicated than I initially thought (it is way easier to do this in ArcGIS), I will contribute this as an example in the cookbook to save others time.

Thanks again for your help!

navidcy · 24 April 2023 07:47

Perhaps someone can help with xarray.apply_ufunc? @angus-g?

angus-g · 26 April 2023 02:39

I wonder if maybe some kind of nearest-neighbour tree would help you do this more efficiently? If you apply the distance calculation + sorting to every pixel in your input, your complexity is something like O(nx \times ny \times nx \times log(nx)). A tree would shave a factor of nx out of that, which is a few orders of magnitude.

Unfortunately, I don’t know of a way to efficiently use the GeoPy distance calculation in the construction of a tree, but you can use the haversine (great-circle) distance to construct a BallTree. Using ice_10 to define the sea ice edge as in your notebook:

from sklearn.neighbors import BallTree

ice_yt = ice_10.yt_ocean[ice_10.isel(time=0).argmax(dim="yt_ocean")]
ice_coords = np.vstack([ice_yt, ice_10.xt_ocean]).T

ball_tree = BallTree(np.radians(ice_coords), metric="haversine")

x, y = np.meshgrid(
    np.radians(var_ice.xt_ocean.values),
    np.radians(var_ice.yt_ocean.values)
)

distances, indices = ball_tree.query(np.vstack([y.flat, x.flat]).T, return_distance=True)

This took about 30s on ARE. It seems to follow the ice edge:

lidefi87 · 26 April 2023 03:03

Thanks for your reply @angus-g. You are amazing! I have been stuck with this for a couple of days and I was getting no where with xarray.apply_ufunc. I tried it and it gives me what I need. I just had to convert the results to kilometers and it is exactly what I was after. Do you mind if I share this in a notebook in the COSIMA repo? I will add your name as a contributor in the notebook.

Thanks again!

angus-g · 26 April 2023 03:15

Glad it worked for you!

Not at all, go for it

lidefi87 · 26 April 2023 04:33

Thanks again everyone! I just submitted a pull request with the new notebook

navidcy · 26 April 2023 20:50

for reference; Adding notebook calculating distance to nearest neighbour by lidefi87 · Pull Request #258 · COSIMA/cosima-recipes · GitHub

Topic		Replies	Views
Extracting data from ACCESS-OM2-01 using coordinate pairs Technical	9	294	6 July 2023
Improving performance of a Python-based function General python , help , outofscope , vectorisation	13	363	29 October 2023
Extracting isopycnals from multi-dimensional arrays Technical python	24	956	16 January 2023
Notebook that takes too long Technical python , cosima	12	478	22 January 2023
Defining lon-lat coords of 1000-metre isobath and extracting region around it Ocean python , cosima , knowledge-base	4	117	24 June 2024

Calculating pairwise distances between points

Related topics