pwlf package contents

pwlf.PiecewiseLinFit(x, y[, disp_res, ...])

class pwlf.PiecewiseLinFit(x, y, disp_res=False, lapack_driver='gelsd', degree=1, weights=None, seed=None)

Bases: object

Methods

assemble_regression_matrix(breaks, x)

Assemble the linear regression matrix A

calc_slopes()

Calculate the slopes of the lines after a piecewise linear function has been fitted.

conlstsq(A[, calc_slopes])

Perform a constrained least squares fit for A matrix.

fit(n_segments[, x_c, y_c, bounds])

Fit a continuous piecewise linear function for a specified number of line segments.

fit_force_points_opt(var)

The objective function to perform a continuous piecewise linear fit for a specified number of breakpoints.

fit_guess(guess_breakpoints[, bounds])

Uses L-BFGS-B optimization to find the location of breakpoints from a guess of where breakpoint locations should be.

fit_with_breaks(breaks)

A function which fits a continuous piecewise linear function for specified breakpoint locations.

fit_with_breaks_force_points(breaks, x_c, y_c)

A function which fits a continuous piecewise linear function for specified breakpoint locations, where you force the fit to go through the data points at x_c and y_c.

fit_with_breaks_opt(var)

The objective function to perform a continuous piecewise linear fit for a specified number of breakpoints.

fitfast(n_segments[, pop, bounds])

Uses multi start LBFGSB optimization to find the location of breakpoints for a given number of line segments by minimizing the sum of the square of the errors.

lstsq(A[, calc_slopes])

Perform the least squares fit for A matrix.

p_values([method, step_size])

Calculate the p-values for each beta parameter.

predict(x[, beta, breaks])

Evaluate the fitted continuous piecewise linear function at untested points.

prediction_variance(x)

Calculate the prediction variance for each specified x location.

r_squared()

Calculate the coefficient of determination ("R squared", R^2) value after a fit has been performed.

standard_errors([method, step_size])

Calculate the standard errors for each beta parameter determined from the piecewise linear fit.

use_custom_opt(n_segments[, x_c, y_c])

Provide the number of line segments you want to use with your custom optimization routine.

assemble_regression_matrix(breaks, x)

Assemble the linear regression matrix A

Parameters:
breaksarray_like

The x locations where each line segment terminates. These are referred to as breakpoints for each line segment. This should be structured as a 1-D numpy array.

xndarray (1-D)

The x locations which the linear regression matrix is assembled on. This must be a numpy array!

Returns:
Andarray (2-D)

The assembled linear regression matrix.

Examples

Assemble the linear regression matrix on the x data for some set of breakpoints.

>>> import pwlf
>>> my_pwlf = pwlf.PiecewiseLinFit(x, y)
>>> breaks = [0.0, 0.5, 1.0]
>>> A = assemble_regression_matrix(breaks, self.x_data)
calc_slopes()

Calculate the slopes of the lines after a piecewise linear function has been fitted.

This will also calculate the y-intercept from each line in the form y = mx + b. The intercepts are stored at self.intercepts.

Returns:
slopesndarray(1-D)

The slope of each ling segment as a 1-D numpy array. This assumes that x[0] <= x[1] <= … <= x[n]. Thus, slopes[0] is the slope of the first line segment.

Examples

Calculate the slopes after performing a simple fit

>>> import pwlf
>>> x = np.linspace(0.0, 1.0, 10)
>>> y = np.random.random(10)
>>> my_pwlf = pwlf.PiecewiseLinFit(x, y)
>>> breaks = my_pwlf.fit(3)
>>> slopes = my_pwlf.calc_slopes()
conlstsq(A, calc_slopes=True)

Perform a constrained least squares fit for A matrix.

Parameters:
Andarray (2-D)

The regression matrix you want to fit in the linear system of equations Ab=y.

calc_slopesboolean, optional

Whether to calculate slopes after performing a fit. Default is calc_slopes=True.

fit(n_segments, x_c=None, y_c=None, bounds=None, **kwargs)

Fit a continuous piecewise linear function for a specified number of line segments. Uses differential evolution to finds the optimum location of breakpoints for a given number of line segments by minimizing the sum of the square error.

Parameters:
n_segmentsint

The desired number of line segments.

x_carray_like, optional

The x locations of the data points that the piecewise linear function will be forced to go through.

y_carray_like, optional

The x locations of the data points that the piecewise linear function will be forced to go through.

boundsarray_like, optional

Bounds for each breakpoint location within the optimization. This should have the shape of (n_segments, 2).

**kwargsoptional

Directly passed into scipy.optimize.differential_evolution(). This will override any pwlf defaults when provided. See Note for more information.

Returns:
fit_breaksfloat

breakpoint locations stored as a 1-D numpy array.

Raises:
ValueError

You probably provided x_c without y_c (or vice versa). You must provide both x_c and y_c if you plan to force the model through data point(s).

ValueError

You can’t specify weights with x_c and y_c.

Notes

All **kwargs are passed into sicpy.optimize.differential_evolution. If any **kwargs is used, it will override my differential_evolution, defaults. This allows advanced users to tweak their own optimization. For me information see: https://github.com/cjekel/piecewise_linear_fit_py/issues/15#issuecomment-434717232

Examples

This example shows you how to fit three continuous piecewise lines to a dataset. This assumes that x is linearly spaced from [0, 1), and y is random.

>>> import pwlf
>>> x = np.linspace(0.0, 1.0, 10)
>>> y = np.random.random(10)
>>> my_pwlf = pwlf.PiecewiseLinFit(x, y)
>>> breaks = my_pwlf.fit(3)

Additionally you desired that the piecewise linear function go through the point (0.0, 0.0).

>>> x_c = [0.0]
>>> y_c = [0.0]
>>> breaks = my_pwlf.fit(3, x_c=x_c, y_c=y_c)

Additionally you desired that the piecewise linear function go through the points (0.0, 0.0) and (1.0, 1.0).

>>> x_c = [0.0, 1.0]
>>> y_c = [0.0, 1.0]
>>> breaks = my_pwlf.fit(3, x_c=x_c, y_c=y_c)
fit_force_points_opt(var)

The objective function to perform a continuous piecewise linear fit for a specified number of breakpoints. This is to be used with a custom optimization routine, and after use_custom_opt has been called.

Use this function if you intend to be force the model through x_c and y_c, while performing a custom optimization.

This was intended for advanced users only. See the following example https://github.com/cjekel/piecewise_linear_fit_py/blob/master/examples/useCustomOptimizationRoutine.py

Parameters:
vararray_like

The breakpoint locations, or variable, in a custom optimization routine.

Returns:
ssrfloat

The sum of square of the residuals.

Raises:
LinAlgError

This typically means your regression problem is ill-conditioned.

Notes

You should run use_custom_opt to initialize necessary object attributes first.

Unlike fit_with_breaks_force_points, fit_force_points_opt automatically assumes that the first and last breakpoints occur at the min and max values of x.

fit_guess(guess_breakpoints, bounds=None, **kwargs)

Uses L-BFGS-B optimization to find the location of breakpoints from a guess of where breakpoint locations should be.

In some cases you may have a good idea where the breakpoint locations occur. It generally won’t be necessary to run a full global optimization to search the entire domain for the breakpoints when you have a good idea where the breakpoints occur. Here a local optimization is run from a guess of the breakpoint locations.

Parameters:
guess_breakpointsarray_like

Guess where the breakpoints occur. This should be a list or numpy array containing the locations where it appears breakpoints occur.

boundsarray_like, optional

Bounds for each breakpoint location within the optimization. This should have the shape of (n_segments, 2).

**kwargsoptional

Directly passed into scipy.optimize.fmin_l_bfgs_b(). This will override any pwlf defaults when provided. See Note for more information.

Returns:
fit_breaksfloat

breakpoint locations stored as a 1-D numpy array.

Notes

All **kwargs are passed into sicpy.optimize.fmin_l_bfgs_b. If any **kwargs is used, it will override my defaults. This allows advanced users to tweak their own optimization. For me information see: https://github.com/cjekel/piecewise_linear_fit_py/issues/15#issuecomment-434717232

You do not need to specify the x.min() or x.max() in geuss_breakpoints!

Examples

In this example we see two distinct linear regions, and we believe a breakpoint occurs at 6.0. We’ll use the fit_guess() function to find the best breakpoint location starting with this guess.

>>> import pwlf
>>> x = np.array([4., 5., 6., 7., 8.])
>>> y = np.array([11., 13., 16., 28.92, 42.81])
>>> my_pwlf = pwlf.PiecewiseLinFit(x, y)
>>> breaks = my_pwlf.fit_guess([6.0])

Note specifying one breakpoint will result in two line segments. If we wanted three line segments, we’ll have to specify two breakpoints.

>>> breaks = my_pwlf.fit_guess([5.5, 6.0])
fit_with_breaks(breaks)

A function which fits a continuous piecewise linear function for specified breakpoint locations.

The function minimizes the sum of the square of the residuals for the x y data.

If you want to understand the math behind this read https://jekel.me/2018/Continous-piecewise-linear-regression/

Other useful resources: http://golovchenko.org/docs/ContinuousPiecewiseLinearFit.pdf https://www.mathworks.com/matlabcentral/fileexchange/40913-piecewise-linear-least-square-fittic http://www.regressionist.com/2018/02/07/continuous-piecewise-linear-fitting/

Parameters:
breaksarray_like

The x locations where each line segment terminates. These are referred to as breakpoints for each line segment. This should be structured as a 1-D numpy array.

Returns:
ssrfloat

Returns the sum of squares of the residuals.

Raises:
LinAlgError

This typically means your regression problem is ill-conditioned.

Examples

If your x data exists from 0 <= x <= 1 and you want three piecewise linear lines where the lines terminate at x = 0.0, 0.3, 0.6, and 1.0. This assumes that x is linearly spaced from [0, 1), and y is random.

>>> import pwlf
>>> x = np.linspace(0.0, 1.0, 10)
>>> y = np.random.random(10)
>>> my_pwlf = pwlf.PiecewiseLinFit(x, y)
>>> breaks = [0.0, 0.3, 0.6, 1.0]
>>> ssr = my_pwlf.fit_with_breaks(breaks)
fit_with_breaks_force_points(breaks, x_c, y_c)

A function which fits a continuous piecewise linear function for specified breakpoint locations, where you force the fit to go through the data points at x_c and y_c.

The function minimizes the sum of the square of the residuals for the pair of x, y data points. If you want to understand the math behind this read https://jekel.me/2018/Force-piecwise-linear-fit-through-data/

Parameters:
breaksarray_like

The x locations where each line segment terminates. These are referred to as breakpoints for each line segment. This should be structured as a 1-D numpy array.

x_carray_like

The x locations of the data points that the piecewise linear function will be forced to go through.

y_carray_like

The x locations of the data points that the piecewise linear function will be forced to go through.

Returns:
Lfloat

Returns the Lagrangian function value. This is the sum of squares of the residuals plus the constraint penalty.

Raises:
LinAlgError

This typically means your regression problem is ill-conditioned.

ValueError

You can’t specify weights with x_c and y_c.

Examples

If your x data exists from 0 <= x <= 1 and you want three piecewise linear lines where the lines terminate at x = 0.0, 0.3, 0.6, and 1.0. This assumes that x is linearly spaced from [0, 1), and y is random. Additionally you desired that the piecewise linear function go through the point (0.0, 0.0)

>>> import pwlf
>>> x = np.linspace(0.0, 1.0, 10)
>>> y = np.random.random(10)
>>> x_c = [0.0]
>>> y_c = [0.0]
>>> my_pwlf = pwlf.PiecewiseLinFit(x, y)
>>> breaks = [0.0, 0.3, 0.6, 1.0]
>>> L = my_pwlf.fit_with_breaks_force_points(breaks, x_c, y_c)
fit_with_breaks_opt(var)

The objective function to perform a continuous piecewise linear fit for a specified number of breakpoints. This is to be used with a custom optimization routine, and after use_custom_opt has been called.

This was intended for advanced users only.

See the following example https://github.com/cjekel/piecewise_linear_fit_py/blob/master/examples/useCustomOptimizationRoutine.py

Parameters:
vararray_like

The breakpoint locations, or variable, in a custom optimization routine.

Returns:
ssrfloat

The sum of square of the residuals.

Raises:
LinAlgError

This typically means your regression problem is ill-conditioned.

Notes

You should run use_custom_opt to initialize necessary object attributes first.

Unlike fit_with_breaks, fit_with_breaks_opt automatically assumes that the first and last breakpoints occur at the min and max values of x.

fitfast(n_segments, pop=2, bounds=None, **kwargs)

Uses multi start LBFGSB optimization to find the location of breakpoints for a given number of line segments by minimizing the sum of the square of the errors.

The idea is that we generate n random latin hypercube samples and run LBFGSB optimization on each one. This isn’t guaranteed to find the global optimum. It’s suppose to be a reasonable compromise between speed and quality of fit. Let me know how it works.

Since this is based on random sampling, you might want to run it multiple times and save the best version… The best version will have the lowest self.ssr (sum of square of residuals).

There is no guarantee that this will be faster than fit(), however you may find it much faster sometimes.

Parameters:
n_segmentsint

The desired number of line segments.

popint, optional

The number of latin hypercube samples to generate. Default pop=2.

boundsarray_like, optional

Bounds for each breakpoint location within the optimization. This should have the shape of (n_segments, 2).

**kwargsoptional

Directly passed into scipy.optimize.fmin_l_bfgs_b(). This will override any pwlf defaults when provided. See Note for more information.

Returns:
fit_breaksfloat

breakpoint locations stored as a 1-D numpy array.

Notes

The default number of multi start optimizations is 2.
  • Decreasing this number will result in a faster run time.

  • Increasing this number will improve the likelihood of finding

    good results

  • You can specify the number of starts using the following call

  • Minimum value of pop is 2

All **kwargs are passed into sicpy.optimize.fmin_l_bfgs_b. If any **kwargs is used, it will override my defaults. This allows advanced users to tweak their own optimization. For me information see: https://github.com/cjekel/piecewise_linear_fit_py/issues/15#issuecomment-434717232

Examples

This example shows you how to fit three continuous piecewise lines to a dataset. This assumes that x is linearly spaced from [0, 1), and y is random.

>>> import pwlf
>>> x = np.linspace(0.0, 1.0, 10)
>>> y = np.random.random(10)
>>> my_pwlf = pwlf.PiecewiseLinFit(x, y)
>>> breaks = my_pwlf.fitfast(3)

You can change the number of latin hypercube samples (or starting point, locations) to use with pop. The following example will use 50 samples.

>>> breaks = my_pwlf.fitfast(3, pop=50)
lstsq(A, calc_slopes=True)

Perform the least squares fit for A matrix.

Parameters:
Andarray (2-D)

The regression matrix you want to fit in the linear system of equations Ab=y.

calc_slopesboolean, optional

Whether to calculate slopes after performing a fit. Default is calc_slopes=True.

p_values(method='linear', step_size=0.0001)

Calculate the p-values for each beta parameter.

This calculates the p-values for the beta parameters under the assumption that your breakpoint locations are known. Section 2.4.2 of [2] defines how to calculate the p-value of individual parameters. This is really a marginal test since each parameter is dependent upon the other parameters.

These values are typically compared to some confidence level alpha for significance. A 95% confidence level would have alpha = 0.05.

Parameters:
methodstring, optional

Calculate the standard errors for a linear or non-linear regression problem. The default is method=’linear’. A taylor- series expansion is performed when method=’non-linear’ (which is commonly referred to as the Delta method).

step_sizefloat, optional

The step size to perform forward differences for the taylor- series expansion when method=’non-linear’. Default is step_size=1e-4.

Returns:
pndarray (1-D)

p-values for each beta parameter where p-value[0] corresponds to beta[0] and so forth

Raises:
AttributeError

You have probably not performed a fit yet.

ValueError

You supplied an unsupported method.

Notes

The linear regression problem is when you know the breakpoint locations (e.g. when using the fit_with_breaks function).

The non-linear regression problem is when you don’t know the breakpoint locations (e.g. when using the fit, fitfast, and fit_guess functions).

See https://github.com/cjekel/piecewise_linear_fit_py/issues/14

References

[2]

Myers RH, Montgomery DC, Anderson-Cook CM. Response surface methodology . Hoboken. New Jersey: John Wiley & Sons, Inc. 2009;20:38-44.

Examples

After performing a fit, one can calculate the p-value for each beta parameter

>>> import pwlf
>>> x = np.linspace(0.0, 1.0, 10)
>>> y = np.random.random(10)
>>> my_pwlf = pwlf.PiecewiseLinFit(x, y)
>>> breaks = my_pwlf.fitfast(3)
>>> x_new = np.linspace(0.0, 1.0, 100)
>>> p = my_pwlf.p_values(x_new)

see also examples/standard_errrors_and_p-values.py

predict(x, beta=None, breaks=None)

Evaluate the fitted continuous piecewise linear function at untested points.

You can manfully specify the breakpoints and calculated values for beta if you want to quickly predict from different models and the same data set.

Parameters:
xarray_like

The x locations where you want to predict the output of the fitted continuous piecewise linear function.

betanone or ndarray (1-D), optional

The model parameters for the continuous piecewise linear fit. Default is None.

breaksnone or array_like, optional

The x locations where each line segment terminates. These are referred to as breakpoints for each line segment. This should be structured as a 1-D numpy array. Default is None.

Returns:
y_hatndarray (1-D)

The predicted values at x.

Examples

Fits a simple model, then predict at x_new locations which are linearly spaced.

>>> import pwlf
>>> x = np.linspace(0.0, 1.0, 10)
>>> y = np.random.random(10)
>>> my_pwlf = pwlf.PiecewiseLinFit(x, y)
>>> breaks = [0.0, 0.3, 0.6, 1.0]
>>> ssr = my_pwlf.fit_with_breaks(breaks)
>>> x_new = np.linspace(0.0, 1.0, 100)
>>> yhat = my_pwlf.predict(x_new)
prediction_variance(x)

Calculate the prediction variance for each specified x location. The prediction variance is the uncertainty of the model due to the lack of data. This can be used to find a 95% confidence interval of possible piecewise linear models based on the current data. This would be done typically as y_hat +- 1.96*np.sqrt(pre_var). The prediction_variance needs to be calculated at various x locations. For more information see: www2.mae.ufl.edu/haftka/vvuq/lectures/Regression-accuracy.pptx

Parameters:
xarray_like

The x locations where you want the prediction variance from the fitted continuous piecewise linear function.

Returns:
pre_varndarray (1-D)

Numpy array (floats) of prediction variance at each x location.

Raises:
AttributeError

You have probably not performed a fit yet.

LinAlgError

This typically means your regression problem is ill-conditioned.

Notes

This assumes that your breakpoint locations are exact! and does not consider the uncertainty with your breakpoint locations.

Examples

Calculate the prediction variance at x_new after performing a simple fit.

>>> import pwlf
>>> x = np.linspace(0.0, 1.0, 10)
>>> y = np.random.random(10)
>>> my_pwlf = pwlf.PiecewiseLinFit(x, y)
>>> breaks = my_pwlf.fitfast(3)
>>> x_new = np.linspace(0.0, 1.0, 100)
>>> pre_var = my_pwlf.prediction_variance(x_new)

see also examples/prediction_variance.py

r_squared()

Calculate the coefficient of determination (“R squared”, R^2) value after a fit has been performed. For more information see: https://en.wikipedia.org/wiki/Coefficient_of_determination

Returns:
rsqfloat

Coefficient of determination, or ‘R squared’ value.

Raises:
AttributeError

You have probably not performed a fit yet.

LinAlgError

This typically means your regression problem is ill-conditioned.

Examples

Calculate the R squared value after performing a simple fit.

>>> import pwlf
>>> x = np.linspace(0.0, 1.0, 10)
>>> y = np.random.random(10)
>>> my_pwlf = pwlf.PiecewiseLinFit(x, y)
>>> breaks = my_pwlf.fitfast(3)
>>> rsq = my_pwlf.r_squared()
standard_errors(method='linear', step_size=0.0001)

Calculate the standard errors for each beta parameter determined from the piecewise linear fit. Typically +- 1.96*se will yield the center of a 95% confidence region around your parameters. This assumes the parmaters follow a normal distribution. For more information see: https://en.wikipedia.org/wiki/Standard_error

This calculation follows the derivation provided in [1].

Parameters:
methodstring, optional

Calculate the standard errors for a linear or non-linear regression problem. The default is method=’linear’. A taylor- series expansion is performed when method=’non-linear’ (which is commonly referred to as the Delta method).

step_sizefloat, optional

The step size to perform forward differences for the taylor- series expansion when method=’non-linear’. Default is step_size=1e-4.

Returns:
sendarray (1-D)

Standard errors associated with each beta parameter. Specifically se[0] correspounds to the standard error for beta[0], and so forth.

Raises:
AttributeError

You have probably not performed a fit yet.

ValueError

You supplied an unsupported method.

LinAlgError

This typically means your regression problem is ill-conditioned.

Notes

The linear regression problem is when you know the breakpoint locations (e.g. when using the fit_with_breaks function).

The non-linear regression problem is when you don’t know the breakpoint locations (e.g. when using the fit, fitfast, and fit_guess functions).

References

[1]

Coppe, A., Haftka, R. T., and Kim, N. H., “Uncertainty Identification of Damage Growth Parameters Using Nonlinear Regression,” AIAA Journal, Vol. 49, No. 12, dec 2011, pp. 2818–2821.

Examples

Calculate the standard errors after performing a simple fit.

>>> import pwlf
>>> x = np.linspace(0.0, 1.0, 10)
>>> y = np.random.random(10)
>>> my_pwlf = pwlf.PiecewiseLinFit(x, y)
>>> breaks = my_pwlf.fitfast(3)
>>> se = my_pwlf.standard_errors()
use_custom_opt(n_segments, x_c=None, y_c=None)

Provide the number of line segments you want to use with your custom optimization routine.

Run this function first to initialize necessary attributes!!!

This was intended for advanced users only.

See the following example https://github.com/cjekel/piecewise_linear_fit_py/blob/master/examples/useCustomOptimizationRoutine.py

Parameters:
n_segmentsint

The x locations where each line segment terminates. These are referred to as breakpoints for each line segment. This should be structured as a 1-D numpy array.

x_cnone or array_like, optional

The x locations of the data points that the piecewise linear function will be forced to go through.

y_cnone or array_like, optional

The x locations of the data points that the piecewise linear function will be forced to go through.

Raises:
ValueError

You can’t specify weights with x_c and y_c.

Notes

Optimize fit_with_breaks_opt(var) where var is a 1D array containing the x locations of your variables var has length n_segments - 1, because the two breakpoints are always defined (1. the min of x, 2. the max of x).

fit_with_breaks_opt(var) will return the sum of the square of the residuals which you’ll want to minimize with your optimization routine.