-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csaps.CubicSmoothingSpline does not handle periodic functions correctly #46
Comments
Thank you for your report. You are right, periodic extrapolation from scipy PPoly seems to work incorrectly with smoothing spline. Unfortunately, I have never tested this feature. If we look at scipy PPoly # With periodic extrapolation we map x to the segment
# [self.x[0], self.x[-1]].
if extrapolate == 'periodic':
x = self.x[0] + (x - self.x[0]) % (self.x[-1] - self.x[0])
extrapolate = False I think we might implement it in the right way for the smoothing spline case. |
Ok, I had a play with this, and think I have a more elegant solution, which could be more widely useful. The basic idea is to compress the representation of any CubicSmoothingSpline into a CubicSpline, and then periodic extrapolation "just works". For any particular level of smoothing, the set of breakpoints and coefficients for a CubicSmoothingSpline over the repeated-x-range described above, can be reduced to a subset of breakpoints, which are located at the x-values of the zero-crossings of the 3rd derivative evaluated at the original (non-repeated) x values, as well as the terminal original x-values of the series. Because the 3rd derivative is piecewise constant, the zero-crossing x-value is simply the x-value of the i+1th break where the zero crossing occurs between break[i] and break[i+1]. We evaluate the CubicSmoothingSpline at these points, to produce new smoothed y values. We need to average y[0] and y[-1] in order to ensure our periodic function works as expected. These new x,y points then define a new CubicSmoothingSpline with smoothing=1.0 (or indeed simply a scipy.interpolate.CubicSpline), for which periodic extrapolation now works as expected. This compressed representation of the CubicSmoothingSpline is more memory and computationally efficient than either the repeated-x-range hack I described above, or indeed a smoothed spline of any smoothing value < 1.0. This is true not just for the periodic case, where I needed to repeat data, but generally all CubicSmoothingSplines with smoothing value < 1.0. This becomes important for people wanting to smooth large quantities of data with heavy smoothing, and might be particularly important for the ND grid case (though the zero-crossings won't in general lie on a reduced grid). So you may wish to use this compressed representation for all internal representation (i.e. modifying csaps.SplinePPForm). Below is example code showing what I mean. There does appear to be some kind of discontinuity in the first derivative of the CubicSmoothingSpline with smoothing=1.0 case (orange line on graph). This is not present when using a scipy.interpolate.CubicSpline with periodic boundary, so this suggests we should use this as the underlying representation (or copy their method of periodic boundary setting). import numpy as np from csaps import csaps np.random.seed(1234) x = np.linspace(-5., 5., 25) smoother = csaps(x, y, smooth=0.85) xs = np.linspace(-7.5, 7.5, 100) ys = smoother(xs) better_smoother = csaps(np.r_[x[:-1]-10,x,x[1:]+10], np.r_[y[:-1],y,y[1:]], smooth=0.85) ys_better = better_smoother(xs) idx_before_zero_crossings = np.where(np.diff(np.sign(better_smoother(x, nu=3))))[0] x_compressed = np.unique(np.r_[x[0],x[idx_before_zero_crossings+1],x[-1]]) even_better_smoother = csaps(x_compressed, y_compressed, smooth=1.0) best_smoother = CubicSpline(x_compressed, y_compressed, bc_type='periodic', extrapolate='periodic') ys_even_better = even_better_smoother(xs,extrapolate='periodic') plt.plot(x, y, 'o', xs, ys, '-', xs, ys_periodic, '-', xs, ys_better, '-', xs, ys_even_better, '-', xs, ys_best, '-') print("non-periodic", smoother([-5,5])) fig, axs = plt.subplots(4, 2, sharex=True, figsize=(15,15)) plt.show() |
Thanks for the explanation and the code example!
See example: import numpy as np
import matplotlib.pyplot as plt
from csaps import CubicSmoothingSpline
from scipy.interpolate import CubicSpline
np.random.seed(1234)
x = np.linspace(-5., 5., 25)
xs = np.linspace(-7.5, 7.5, 100)
y = np.exp(-(x/2.5)**2) + (np.random.rand(25) - 0.2) * 0.3
ss = CubicSmoothingSpline(x, y, smooth=1.0)
cs = CubicSpline(x, y, bc_type='natural')
ys = ss(xs, extrapolate='periodic')
yc = cs(xs, extrapolate='periodic')
plt.plot(x, y, 'o')
plt.plot(xs, ys, '-', linewidth=2, label='ss')
plt.plot(xs, yc, '-', linewidth=1, label='cs')
plt.legend()
plt.show()
|
Good investigative powers ;) So I think what needs to be done is the pattern for CubicSpline should be followed, where there is a bc_type parameter on creating the spline. This can default to current behavior, but when you want periodic behavior, it is set. My understanding of the csaps code is insufficiently deep to actually implement it, but I think in _sspumv.py at def _make_spline(x, y, w, smooth, shape) you would introduce this parameter, and I think you would use slightly different formulation for the periodic boundary case here, thus avoiding my hack of repeating data points either side of the window of interest. This would then allow proper periodic behaviour without needing my break-compression method described above, so that only becomes an optional extra that you could tack on the end of _make_spline regardless of boundary condition type. The break-compression scheme really is only necessary if people are dealing with huge amounts of data and consequently huge numbers of breaks though - if not necessary for getting periodic splines working, it may not be worth the inevitable future code maintenance overhead. |
By the way, the big jump at the boundary in your example code above is due to failing to match the endpoint conditions in the input data. Insert this line: y[0] = y[-1] = np.mean(y[[0,-1]]) and it goes away. See this in the notes for bc_type of CubicSpline: ‘periodic’: The interpolated functions is assumed to be periodic of period x[-1] - x[0]. The first and last value of y must be identical: y[0] == y[-1]. This boundary condition will result in y'[0] == y'[-1] and y''[0] == y''[-1]. My code above is also faulty in this regard - but once corrected, there is still a (smaller) evident boundary condition jump in the second derivative for CubicSmoothingSpline vs CubicSpline when CubicSpline is set up with bc_type='periodic' |
I will think about it. It would be nice to add various boundary conditions (the same as in CubicSpline). Also it relates to #34 |
Whilst you're modifying _make_spline() you might want to look at this error path: There's no good reason for this to break the code - it just makes things brittle when in production, or forces the user to wrap your code in something that does this assuming they notice the requirement. Instead, use numpy.unique to sort and group all the unique x values, and average them (this would be the same least-squares effect as repeated points). |
I think explicit is better than implicit. Also if we try to set the same x values to CubicSpline we get a similar error:
The code from scipy: dx = np.diff(x)
if np.any(dx <= 0):
raise ValueError("`x` must be strictly increasing sequence.") I think csaps should not perform any implicit pre-processing and correcting input data. A user may not get what they expect. |
I assume it will take you some time to look into the correct boundary conditions for periodic functions. In the meantime, here is a function that applies my earlier "compression" technique to produce a scipy CubicSpline that is very close to the CubicSmoothingSpline - this can be used for natural splines, or with periodic splines if you replicate your data at - / + 1 period and then set the xrange parameter.
Example usage is also shown
|
csaps.CubicSmoothingSpline purports to support periodic functions, but does not implement this correctly.
I believe this is because it is a smoothing spline, which means that there are far more knots stored than "effective knots" due to smoothing. This means that when the periodic function is used by the underlying scipy implementation, which likely only looks at the first and last knots, you get a sharp discontinuity.
This is all purely speculation!
A not-brilliant-fix is to repeat your data -1 and +1 period and fit just the usual (non-periodic) smoothed spline to this. There will not be any huge discontinuity, but this does not guarantee that your endpoints will match exactly, so it's better but not perfect. Note that you can't have identical x values in your input, so you need to offset very slightly, or average your endpoints.
The correct solution is probably to have the smoother correctly consider periodic functions, either when you generate the smoother with CubicSmoothingSpline() which would require an additional parameter for periodic functions, or re-implementing the underlying scipy for interpolating/extrapolating (which is probably more work).
You can test this with the following code:
import numpy as np
import matplotlib.pyplot as plt
from csaps import csaps
np.random.seed(1234)
x = np.linspace(-5., 5., 25)
y = np.exp(-(x/2.5)**2) + (np.random.rand(25) - 0.2) * 0.3
smoother = csaps(x, y, smooth=0.85)
xs = np.linspace(-6.5, 6.5, 100)
ys = smoother(xs)
ys_periodic = smoother(xs, extrapolate='periodic')
better_smoother = csaps(np.r_[x-10.001,x,x+10.001], np.r_[y,y,y], smooth=0.85)
ys_better = better_smoother(xs)
plt.plot(x, y, 'o', xs, ys, '-', xs, ys_periodic, '-', xs, ys_better, '-')
plt.show()
print("non-periodic", smoother([-5,5]))
print("periodic",smoother([-5,5], extrapolate='periodic'))
print("better, but not perfect",better_smoother([-5,5]))
The text was updated successfully, but these errors were encountered: