Calculating the 2D polyfit in python

Hi everyone,

This is probably a simple question, but I am trying to calculate the 2D polyfit on a 2D numpy array. However, unlike the 1D array poly fit, increasing the degrees results in a much worse result. In the 1D case, the higher the degree, the lower the error between the fit and the original data. Below is a minimal example which demonstrates this:

import numpy as np
import matplotlib.pyplot as plt

# Input data
a = np.array([[31.93374405, 32.09811459, 32.60427797, 34.02710551, 34.69049702, 34.58565112,
  34.33578857, 33.9491244 ],
 [31.95476843, 32.04424433, 32.06624189, 33.72400109, 34.09484176, 34.60559209,
  34.3172927,  34.00842212]])

# Degree of the polynomial fit in x and y dimensions
degree_x = 2
degree_y = 2

# Generate grid points for evaluation
x = np.arange(a.shape[1])
y = np.arange(a.shape[0])
x_grid, y_grid = np.meshgrid(x, y)

# Flatten the grid points and the corresponding values
X_flat = x_grid.flatten()
Y_flat = y_grid.flatten()
Z_flat = a.flatten()

# Generate Vandermonde matrix
V = np.polynomial.polynomial.polyvander2d(X_flat, Y_flat, [degree_x, degree_y])

# Calculate polynomial coefficients using least squares
coeffs, _, _, _ = np.linalg.lstsq(V, Z_flat)

# Reshape the coefficients into a 2D matrix
coeffs_matrix = np.reshape(coeffs, (degree_y + 1, degree_x + 1))

# Evaluate the polynomial equation on the grid
poly_fit = np.polynomial.polynomial.polyval2d(x_grid, y_grid, coeffs_matrix)

# Plot the original data and the polynomial fit
plt.pcolor(x_grid, y_grid, a, cmap='viridis')
# plt.colorbar(label='Value')
plt.show()
plt.pcolor(x_grid, y_grid, poly_fit, linestyles='dashed')
plt.xlabel('x')
plt.ylabel('y')
plt.title('2D Polynomial Fit')
plt.show()

Increasing degree_x and degree_y leads to wildly wrong fits. I’m not sure why this is. Any help would be appreciated!

1 Like

What is array a here? I would start out with a synthetic array a that is actually a polynomial of known degree in x and y and see if 2D polyfit gets that at the appropriate order. Then I’d check that if you start asking to fit a polynomial of higher order it doesn’t mess up…

But you made a true MWE here. kudos for that!

1 Like

And because your code was so simple I was intrigued to run myself. Here’s what I learned (perhaps it’s helpful?)

I tried out a known polynomial of order 2 in x and 3 in y… If then I ask for a fit of degree_x = degree_y = 3 things work pretty good. However if I start increasing the degree at some point things break. But intuitively, I expect that if my data has size (nx, ny) then I cannot expect to be able to fit a polynomial more than degree nx-1, ny-1. This is more or less what I observe, in the sense, that when my degree reaches about the dimension of my data then things start going weird. I only played around a few minutes though – haven’t thoroughly tested this though…

E.g. running the code

import numpy as np
import matplotlib.pyplot as plt

# Input data
nx, ny = 10, 6

a = np.zeros((nx, ny))

for j in np.arange(ny):
    for i in np.arange(nx):
        a[i, j] = i**2 + j**3

# Degree of the polynomial fit in x and y dimensions
degree_x = 4
degree_y = 4

# Generate grid points for evaluation
x = np.arange(a.shape[1])
y = np.arange(a.shape[0])
x_grid, y_grid = np.meshgrid(x, y)

# Flatten the grid points and the corresponding values
X_flat = x_grid.flatten()
Y_flat = y_grid.flatten()
Z_flat = a.flatten()

# Generate Vandermonde matrix
V = np.polynomial.polynomial.polyvander2d(X_flat, Y_flat, [degree_x, degree_y])

# Calculate polynomial coefficients using least squares
coeffs, _, _, _ = np.linalg.lstsq(V, Z_flat)

# Reshape the coefficients into a 2D matrix
coeffs_matrix = np.reshape(coeffs, (degree_y + 1, degree_x + 1))

# Evaluate the polynomial equation on the grid
poly_fit = np.polynomial.polynomial.polyval2d(x_grid, y_grid, coeffs_matrix)

# Plot the original data and the polynomial fit and the difference
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(12, 3))

p1 = ax1.pcolor(x_grid, y_grid, a, cmap='viridis')
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.set_title('original array')
plt.colorbar(p1)

p2 = ax2.pcolor(x_grid, y_grid, poly_fit, cmap='viridis')
ax2.set_xlabel('x')
ax2.set_ylabel('y')
ax2.set_title('2D Polynomial Fit')
plt.colorbar(p2)

p3 = ax3.pcolor(x_grid, y_grid, np.abs(a-poly_fit), cmap='viridis')
ax3.set_xlabel('x')
ax3.set_ylabel('y')
ax3.set_title('|difference|')
plt.colorbar(p3)
plt.show()

with

degree_x = 4
degree_y = 4

we get

but if rerun but this time choose

degree_y = 5 # = ny-1

then things look horrible… (Note difference in colorbar values!)

Thanks for playing around with this! Actually, the original array I was working with was of shape (35, 800), but even there the fit is horrible for degree_x>3, degree_y>3. So while I agree with you that the degree should be smaller than the array size, it doesn’t seem to be the issue based on my testing. But I’m still not sure if my code is the issue or if there is a fundamental concept I am missing.

OK, so you are saying that what I did resolves the issue in your mwe but not in your original array?

Does the original array include a NaN? Perhaps you can post the original array with the script that does the fit and I can play around. A second set of eyes is often useful. Chances are that either there is something simple you are missing or there is something to learn here.

If you need to share more complicated code @taimoorsohail I can recommend this guide on how you might do this

(because I wrote it)

Is there a reason you don’t use numpy.polynomial.polynomial.polyfit seeing as you’re using np.polynomial.polynomial.polyval2d?

Thanks @navidcy and @Aidan for the responses. I’ve attached the larger [35x800] dataset I was originally using here.

The updated code (which incorporates the attached data array but is otherwise identical to the above) is below.

@Aidan I was under the impression that polyfit evaluates 1D arrays (or groups of 1D arrays) and produces polyfits for each y value. I am instead looking for a high order 2D polynomial that represents 2D salinity data well and can be evaluated at any x and y value. I don’t think polyfit does that, which is why I had to calculate a matrix of coefficients using np.linalg.lstsq.

import numpy as np
import matplotlib.pyplot as plt

data = xr.open_mfdataset('salinity_2D_array.nc')

# Input data
a = data.Saltbinned.values

# Degree of the polynomial fit in x and y dimensions
degree_x = 2
degree_y = 2

# Generate grid points for evaluation
x = data.accarea.values
y = data.press.values
x_grid, y_grid = np.meshgrid(x, y)

# Flatten the grid points and the corresponding values
X_flat = x_grid.flatten()
Y_flat = y_grid.flatten()
Z_flat = a.flatten()

# Generate Vandermonde matrix
V = np.polynomial.polynomial.polyvander2d(X_flat, Y_flat, [degree_x, degree_y])

# Calculate polynomial coefficients using least squares
coeffs, _, _, _ = np.linalg.lstsq(V, Z_flat)

# Reshape the coefficients into a 2D matrix
coeffs_matrix = np.reshape(coeffs, (degree_y + 1, degree_x + 1))

# Evaluate the polynomial equation on the grid
poly_fit = np.polynomial.polynomial.polyval2d(x_grid, y_grid, coeffs_matrix)

# Plot the original data and the polynomial fit
plt.pcolor(x_grid, y_grid, a, cmap='viridis')
# plt.colorbar(label='Value')
plt.show()
plt.pcolor(x_grid, y_grid, poly_fit, linestyles='dashed')
plt.xlabel('x')
plt.ylabel('y')
plt.title('2D Polynomial Fit')
plt.show()
1 Like

I actually had the same question e.g. why we need a vandermonde matrix etc… But you may have clarified already. I’ll play around laters.

1 Like

I’m sure you’re correct. I just saw there was a polyfit method and thought it might be a good idea to give it a try to make sure it wasn’t an incompatibility in assumptions/layout of the different packages. If it isn’t fit for purpose then that is a good reason not to use it! :smile:

Just giving this a bump to see if anyone has thoughts on how to resolve this technical issue?

Sorry, I didn’t find a chance to play around yet!