pwlf package contents¶

Methods 
 class pwlf.PiecewiseLinFit(x, y, disp_res=False, lapack_driver='gelsd', degree=1, weights=None, seed=None)¶
Bases:
object
Methods
assemble_regression_matrix
(breaks, x)Assemble the linear regression matrix A
Calculate the slopes of the lines after a piecewise linear function has been fitted.
conlstsq
(A[, calc_slopes])Perform a constrained least squares fit for A matrix.
fit
(n_segments[, x_c, y_c, bounds])Fit a continuous piecewise linear function for a specified number of line segments.
fit_force_points_opt
(var)The objective function to perform a continuous piecewise linear fit for a specified number of breakpoints.
fit_guess
(guess_breakpoints[, bounds])Uses LBFGSB optimization to find the location of breakpoints from a guess of where breakpoint locations should be.
fit_with_breaks
(breaks)A function which fits a continuous piecewise linear function for specified breakpoint locations.
fit_with_breaks_force_points
(breaks, x_c, y_c)A function which fits a continuous piecewise linear function for specified breakpoint locations, where you force the fit to go through the data points at x_c and y_c.
fit_with_breaks_opt
(var)The objective function to perform a continuous piecewise linear fit for a specified number of breakpoints.
fitfast
(n_segments[, pop, bounds])Uses multi start LBFGSB optimization to find the location of breakpoints for a given number of line segments by minimizing the sum of the square of the errors.
lstsq
(A[, calc_slopes])Perform the least squares fit for A matrix.
p_values
([method, step_size])Calculate the pvalues for each beta parameter.
predict
(x[, beta, breaks])Evaluate the fitted continuous piecewise linear function at untested points.
Calculate the prediction variance for each specified x location.
Calculate the coefficient of determination ("R squared", R^2) value after a fit has been performed.
standard_errors
([method, step_size])Calculate the standard errors for each beta parameter determined from the piecewise linear fit.
use_custom_opt
(n_segments[, x_c, y_c])Provide the number of line segments you want to use with your custom optimization routine.
 assemble_regression_matrix(breaks, x)¶
Assemble the linear regression matrix A
 Parameters
 breaksarray_like
The x locations where each line segment terminates. These are referred to as breakpoints for each line segment. This should be structured as a 1D numpy array.
 xndarray (1D)
The x locations which the linear regression matrix is assembled on. This must be a numpy array!
 Returns
 Andarray (2D)
The assembled linear regression matrix.
Examples
Assemble the linear regression matrix on the x data for some set of breakpoints.
>>> import pwlf >>> my_pwlf = pwlf.PiecewiseLinFit(x, y) >>> breaks = [0.0, 0.5, 1.0] >>> A = assemble_regression_matrix(breaks, self.x_data)
 calc_slopes()¶
Calculate the slopes of the lines after a piecewise linear function has been fitted.
This will also calculate the yintercept from each line in the form y = mx + b. The intercepts are stored at self.intercepts.
 Returns
 slopesndarray(1D)
The slope of each ling segment as a 1D numpy array. This assumes that x[0] <= x[1] <= … <= x[n]. Thus, slopes[0] is the slope of the first line segment.
Examples
Calculate the slopes after performing a simple fit
>>> import pwlf >>> x = np.linspace(0.0, 1.0, 10) >>> y = np.random.random(10) >>> my_pwlf = pwlf.PiecewiseLinFit(x, y) >>> breaks = my_pwlf.fit(3) >>> slopes = my_pwlf.calc_slopes()
 conlstsq(A, calc_slopes=True)¶
Perform a constrained least squares fit for A matrix.
 Parameters
 Andarray (2D)
The regression matrix you want to fit in the linear system of equations Ab=y.
 calc_slopesboolean, optional
Whether to calculate slopes after performing a fit. Default is calc_slopes=True.
 fit(n_segments, x_c=None, y_c=None, bounds=None, **kwargs)¶
Fit a continuous piecewise linear function for a specified number of line segments. Uses differential evolution to finds the optimum location of breakpoints for a given number of line segments by minimizing the sum of the square error.
 Parameters
 n_segmentsint
The desired number of line segments.
 x_carray_like, optional
The x locations of the data points that the piecewise linear function will be forced to go through.
 y_carray_like, optional
The x locations of the data points that the piecewise linear function will be forced to go through.
 boundsarray_like, optional
Bounds for each breakpoint location within the optimization. This should have the shape of (n_segments, 2).
 **kwargsoptional
Directly passed into scipy.optimize.differential_evolution(). This will override any pwlf defaults when provided. See Note for more information.
 Returns
 fit_breaksfloat
breakpoint locations stored as a 1D numpy array.
 Raises
 ValueError
You probably provided x_c without y_c (or vice versa). You must provide both x_c and y_c if you plan to force the model through data point(s).
 ValueError
You can’t specify weights with x_c and y_c.
Notes
All **kwargs are passed into sicpy.optimize.differential_evolution. If any **kwargs is used, it will override my differential_evolution, defaults. This allows advanced users to tweak their own optimization. For me information see: https://github.com/cjekel/piecewise_linear_fit_py/issues/15#issuecomment434717232
Examples
This example shows you how to fit three continuous piecewise lines to a dataset. This assumes that x is linearly spaced from [0, 1), and y is random.
>>> import pwlf >>> x = np.linspace(0.0, 1.0, 10) >>> y = np.random.random(10) >>> my_pwlf = pwlf.PiecewiseLinFit(x, y) >>> breaks = my_pwlf.fit(3)
Additionally you desired that the piecewise linear function go through the point (0.0, 0.0).
>>> x_c = [0.0] >>> y_c = [0.0] >>> breaks = my_pwlf.fit(3, x_c=x_c, y_c=y_c)
Additionally you desired that the piecewise linear function go through the points (0.0, 0.0) and (1.0, 1.0).
>>> x_c = [0.0, 1.0] >>> y_c = [0.0, 1.0] >>> breaks = my_pwlf.fit(3, x_c=x_c, y_c=y_c)
 fit_force_points_opt(var)¶
The objective function to perform a continuous piecewise linear fit for a specified number of breakpoints. This is to be used with a custom optimization routine, and after use_custom_opt has been called.
Use this function if you intend to be force the model through x_c and y_c, while performing a custom optimization.
This was intended for advanced users only. See the following example https://github.com/cjekel/piecewise_linear_fit_py/blob/master/examples/useCustomOptimizationRoutine.py
 Parameters
 vararray_like
The breakpoint locations, or variable, in a custom optimization routine.
 Returns
 ssrfloat
The sum of square of the residuals.
 Raises
 LinAlgError
This typically means your regression problem is illconditioned.
Notes
You should run use_custom_opt to initialize necessary object attributes first.
Unlike fit_with_breaks_force_points, fit_force_points_opt automatically assumes that the first and last breakpoints occur at the min and max values of x.
 fit_guess(guess_breakpoints, bounds=None, **kwargs)¶
Uses LBFGSB optimization to find the location of breakpoints from a guess of where breakpoint locations should be.
In some cases you may have a good idea where the breakpoint locations occur. It generally won’t be necessary to run a full global optimization to search the entire domain for the breakpoints when you have a good idea where the breakpoints occur. Here a local optimization is run from a guess of the breakpoint locations.
 Parameters
 guess_breakpointsarray_like
Guess where the breakpoints occur. This should be a list or numpy array containing the locations where it appears breakpoints occur.
 boundsarray_like, optional
Bounds for each breakpoint location within the optimization. This should have the shape of (n_segments, 2).
 **kwargsoptional
Directly passed into scipy.optimize.fmin_l_bfgs_b(). This will override any pwlf defaults when provided. See Note for more information.
 Returns
 fit_breaksfloat
breakpoint locations stored as a 1D numpy array.
Notes
All **kwargs are passed into sicpy.optimize.fmin_l_bfgs_b. If any **kwargs is used, it will override my defaults. This allows advanced users to tweak their own optimization. For me information see: https://github.com/cjekel/piecewise_linear_fit_py/issues/15#issuecomment434717232
You do not need to specify the x.min() or x.max() in geuss_breakpoints!
Examples
In this example we see two distinct linear regions, and we believe a breakpoint occurs at 6.0. We’ll use the fit_guess() function to find the best breakpoint location starting with this guess.
>>> import pwlf >>> x = np.array([4., 5., 6., 7., 8.]) >>> y = np.array([11., 13., 16., 28.92, 42.81]) >>> my_pwlf = pwlf.PiecewiseLinFit(x, y) >>> breaks = my_pwlf.fit_guess([6.0])
Note specifying one breakpoint will result in two line segments. If we wanted three line segments, we’ll have to specify two breakpoints.
>>> breaks = my_pwlf.fit_guess([5.5, 6.0])
 fit_with_breaks(breaks)¶
A function which fits a continuous piecewise linear function for specified breakpoint locations.
The function minimizes the sum of the square of the residuals for the x y data.
If you want to understand the math behind this read https://jekel.me/2018/Continouspiecewiselinearregression/
Other useful resources: http://golovchenko.org/docs/ContinuousPiecewiseLinearFit.pdf https://www.mathworks.com/matlabcentral/fileexchange/40913piecewiselinearleastsquarefittic http://www.regressionist.com/2018/02/07/continuouspiecewiselinearfitting/
 Parameters
 breaksarray_like
The x locations where each line segment terminates. These are referred to as breakpoints for each line segment. This should be structured as a 1D numpy array.
 Returns
 ssrfloat
Returns the sum of squares of the residuals.
 Raises
 LinAlgError
This typically means your regression problem is illconditioned.
Examples
If your x data exists from 0 <= x <= 1 and you want three piecewise linear lines where the lines terminate at x = 0.0, 0.3, 0.6, and 1.0. This assumes that x is linearly spaced from [0, 1), and y is random.
>>> import pwlf >>> x = np.linspace(0.0, 1.0, 10) >>> y = np.random.random(10) >>> my_pwlf = pwlf.PiecewiseLinFit(x, y) >>> breaks = [0.0, 0.3, 0.6, 1.0] >>> ssr = my_pwlf.fit_with_breaks(breaks)
 fit_with_breaks_force_points(breaks, x_c, y_c)¶
A function which fits a continuous piecewise linear function for specified breakpoint locations, where you force the fit to go through the data points at x_c and y_c.
The function minimizes the sum of the square of the residuals for the pair of x, y data points. If you want to understand the math behind this read https://jekel.me/2018/Forcepiecwiselinearfitthroughdata/
 Parameters
 breaksarray_like
The x locations where each line segment terminates. These are referred to as breakpoints for each line segment. This should be structured as a 1D numpy array.
 x_carray_like
The x locations of the data points that the piecewise linear function will be forced to go through.
 y_carray_like
The x locations of the data points that the piecewise linear function will be forced to go through.
 Returns
 Lfloat
Returns the Lagrangian function value. This is the sum of squares of the residuals plus the constraint penalty.
 Raises
 LinAlgError
This typically means your regression problem is illconditioned.
 ValueError
You can’t specify weights with x_c and y_c.
Examples
If your x data exists from 0 <= x <= 1 and you want three piecewise linear lines where the lines terminate at x = 0.0, 0.3, 0.6, and 1.0. This assumes that x is linearly spaced from [0, 1), and y is random. Additionally you desired that the piecewise linear function go through the point (0.0, 0.0)
>>> import pwlf >>> x = np.linspace(0.0, 1.0, 10) >>> y = np.random.random(10) >>> x_c = [0.0] >>> y_c = [0.0] >>> my_pwlf = pwlf.PiecewiseLinFit(x, y) >>> breaks = [0.0, 0.3, 0.6, 1.0] >>> L = my_pwlf.fit_with_breaks_force_points(breaks, x_c, y_c)
 fit_with_breaks_opt(var)¶
The objective function to perform a continuous piecewise linear fit for a specified number of breakpoints. This is to be used with a custom optimization routine, and after use_custom_opt has been called.
This was intended for advanced users only.
See the following example https://github.com/cjekel/piecewise_linear_fit_py/blob/master/examples/useCustomOptimizationRoutine.py
 Parameters
 vararray_like
The breakpoint locations, or variable, in a custom optimization routine.
 Returns
 ssrfloat
The sum of square of the residuals.
 Raises
 LinAlgError
This typically means your regression problem is illconditioned.
Notes
You should run use_custom_opt to initialize necessary object attributes first.
Unlike fit_with_breaks, fit_with_breaks_opt automatically assumes that the first and last breakpoints occur at the min and max values of x.
 fitfast(n_segments, pop=2, bounds=None, **kwargs)¶
Uses multi start LBFGSB optimization to find the location of breakpoints for a given number of line segments by minimizing the sum of the square of the errors.
The idea is that we generate n random latin hypercube samples and run LBFGSB optimization on each one. This isn’t guaranteed to find the global optimum. It’s suppose to be a reasonable compromise between speed and quality of fit. Let me know how it works.
Since this is based on random sampling, you might want to run it multiple times and save the best version… The best version will have the lowest self.ssr (sum of square of residuals).
There is no guarantee that this will be faster than fit(), however you may find it much faster sometimes.
 Parameters
 n_segmentsint
The desired number of line segments.
 popint, optional
The number of latin hypercube samples to generate. Default pop=2.
 boundsarray_like, optional
Bounds for each breakpoint location within the optimization. This should have the shape of (n_segments, 2).
 **kwargsoptional
Directly passed into scipy.optimize.fmin_l_bfgs_b(). This will override any pwlf defaults when provided. See Note for more information.
 Returns
 fit_breaksfloat
breakpoint locations stored as a 1D numpy array.
Notes
 The default number of multi start optimizations is 2.
Decreasing this number will result in a faster run time.
 Increasing this number will improve the likelihood of finding
good results
You can specify the number of starts using the following call
Minimum value of pop is 2
All **kwargs are passed into sicpy.optimize.fmin_l_bfgs_b. If any **kwargs is used, it will override my defaults. This allows advanced users to tweak their own optimization. For me information see: https://github.com/cjekel/piecewise_linear_fit_py/issues/15#issuecomment434717232
Examples
This example shows you how to fit three continuous piecewise lines to a dataset. This assumes that x is linearly spaced from [0, 1), and y is random.
>>> import pwlf >>> x = np.linspace(0.0, 1.0, 10) >>> y = np.random.random(10) >>> my_pwlf = pwlf.PiecewiseLinFit(x, y) >>> breaks = my_pwlf.fitfast(3)
You can change the number of latin hypercube samples (or starting point, locations) to use with pop. The following example will use 50 samples.
>>> breaks = my_pwlf.fitfast(3, pop=50)
 lstsq(A, calc_slopes=True)¶
Perform the least squares fit for A matrix.
 Parameters
 Andarray (2D)
The regression matrix you want to fit in the linear system of equations Ab=y.
 calc_slopesboolean, optional
Whether to calculate slopes after performing a fit. Default is calc_slopes=True.
 p_values(method='linear', step_size=0.0001)¶
Calculate the pvalues for each beta parameter.
This calculates the pvalues for the beta parameters under the assumption that your breakpoint locations are known. Section 2.4.2 of [2] defines how to calculate the pvalue of individual parameters. This is really a marginal test since each parameter is dependent upon the other parameters.
These values are typically compared to some confidence level alpha for significance. A 95% confidence level would have alpha = 0.05.
 Parameters
 methodstring, optional
Calculate the standard errors for a linear or nonlinear regression problem. The default is method=’linear’. A taylor series expansion is performed when method=’nonlinear’ (which is commonly referred to as the Delta method).
 step_sizefloat, optional
The step size to perform forward differences for the taylor series expansion when method=’nonlinear’. Default is step_size=1e4.
 Returns
 pndarray (1D)
pvalues for each beta parameter where pvalue[0] corresponds to beta[0] and so forth
 Raises
 AttributeError
You have probably not performed a fit yet.
 ValueError
You supplied an unsupported method.
Notes
The linear regression problem is when you know the breakpoint locations (e.g. when using the fit_with_breaks function).
The nonlinear regression problem is when you don’t know the breakpoint locations (e.g. when using the fit, fitfast, and fit_guess functions).
See https://github.com/cjekel/piecewise_linear_fit_py/issues/14
References
 2
Myers RH, Montgomery DC, AndersonCook CM. Response surface methodology . Hoboken. New Jersey: John Wiley & Sons, Inc. 2009;20:3844.
Examples
After performing a fit, one can calculate the pvalue for each beta parameter
>>> import pwlf >>> x = np.linspace(0.0, 1.0, 10) >>> y = np.random.random(10) >>> my_pwlf = pwlf.PiecewiseLinFit(x, y) >>> breaks = my_pwlf.fitfast(3) >>> x_new = np.linspace(0.0, 1.0, 100) >>> p = my_pwlf.p_values(x_new)
see also examples/standard_errrors_and_pvalues.py
 predict(x, beta=None, breaks=None)¶
Evaluate the fitted continuous piecewise linear function at untested points.
You can manfully specify the breakpoints and calculated values for beta if you want to quickly predict from different models and the same data set.
 Parameters
 xarray_like
The x locations where you want to predict the output of the fitted continuous piecewise linear function.
 betanone or ndarray (1D), optional
The model parameters for the continuous piecewise linear fit. Default is None.
 breaksnone or array_like, optional
The x locations where each line segment terminates. These are referred to as breakpoints for each line segment. This should be structured as a 1D numpy array. Default is None.
 Returns
 y_hatndarray (1D)
The predicted values at x.
Examples
Fits a simple model, then predict at x_new locations which are linearly spaced.
>>> import pwlf >>> x = np.linspace(0.0, 1.0, 10) >>> y = np.random.random(10) >>> my_pwlf = pwlf.PiecewiseLinFit(x, y) >>> breaks = [0.0, 0.3, 0.6, 1.0] >>> ssr = my_pwlf.fit_with_breaks(breaks) >>> x_new = np.linspace(0.0, 1.0, 100) >>> yhat = my_pwlf.predict(x_new)
 prediction_variance(x)¶
Calculate the prediction variance for each specified x location. The prediction variance is the uncertainty of the model due to the lack of data. This can be used to find a 95% confidence interval of possible piecewise linear models based on the current data. This would be done typically as y_hat + 1.96*np.sqrt(pre_var). The prediction_variance needs to be calculated at various x locations. For more information see: www2.mae.ufl.edu/haftka/vvuq/lectures/Regressionaccuracy.pptx
 Parameters
 xarray_like
The x locations where you want the prediction variance from the fitted continuous piecewise linear function.
 Returns
 pre_varndarray (1D)
Numpy array (floats) of prediction variance at each x location.
 Raises
 AttributeError
You have probably not performed a fit yet.
 LinAlgError
This typically means your regression problem is illconditioned.
Notes
This assumes that your breakpoint locations are exact! and does not consider the uncertainty with your breakpoint locations.
Examples
Calculate the prediction variance at x_new after performing a simple fit.
>>> import pwlf >>> x = np.linspace(0.0, 1.0, 10) >>> y = np.random.random(10) >>> my_pwlf = pwlf.PiecewiseLinFit(x, y) >>> breaks = my_pwlf.fitfast(3) >>> x_new = np.linspace(0.0, 1.0, 100) >>> pre_var = my_pwlf.prediction_variance(x_new)
see also examples/prediction_variance.py
 r_squared()¶
Calculate the coefficient of determination (“R squared”, R^2) value after a fit has been performed. For more information see: https://en.wikipedia.org/wiki/Coefficient_of_determination
 Returns
 rsqfloat
Coefficient of determination, or ‘R squared’ value.
 Raises
 AttributeError
You have probably not performed a fit yet.
 LinAlgError
This typically means your regression problem is illconditioned.
Examples
Calculate the R squared value after performing a simple fit.
>>> import pwlf >>> x = np.linspace(0.0, 1.0, 10) >>> y = np.random.random(10) >>> my_pwlf = pwlf.PiecewiseLinFit(x, y) >>> breaks = my_pwlf.fitfast(3) >>> rsq = my_pwlf.r_squared()
 standard_errors(method='linear', step_size=0.0001)¶
Calculate the standard errors for each beta parameter determined from the piecewise linear fit. Typically + 1.96*se will yield the center of a 95% confidence region around your parameters. This assumes the parmaters follow a normal distribution. For more information see: https://en.wikipedia.org/wiki/Standard_error
This calculation follows the derivation provided in [1].
 Parameters
 methodstring, optional
Calculate the standard errors for a linear or nonlinear regression problem. The default is method=’linear’. A taylor series expansion is performed when method=’nonlinear’ (which is commonly referred to as the Delta method).
 step_sizefloat, optional
The step size to perform forward differences for the taylor series expansion when method=’nonlinear’. Default is step_size=1e4.
 Returns
 sendarray (1D)
Standard errors associated with each beta parameter. Specifically se[0] correspounds to the standard error for beta[0], and so forth.
 Raises
 AttributeError
You have probably not performed a fit yet.
 ValueError
You supplied an unsupported method.
 LinAlgError
This typically means your regression problem is illconditioned.
Notes
The linear regression problem is when you know the breakpoint locations (e.g. when using the fit_with_breaks function).
The nonlinear regression problem is when you don’t know the breakpoint locations (e.g. when using the fit, fitfast, and fit_guess functions).
References
 1
Coppe, A., Haftka, R. T., and Kim, N. H., “Uncertainty Identification of Damage Growth Parameters Using Nonlinear Regression,” AIAA Journal, Vol. 49, No. 12, dec 2011, pp. 2818–2821.
Examples
Calculate the standard errors after performing a simple fit.
>>> import pwlf >>> x = np.linspace(0.0, 1.0, 10) >>> y = np.random.random(10) >>> my_pwlf = pwlf.PiecewiseLinFit(x, y) >>> breaks = my_pwlf.fitfast(3) >>> se = my_pwlf.standard_errors()
 use_custom_opt(n_segments, x_c=None, y_c=None)¶
Provide the number of line segments you want to use with your custom optimization routine.
Run this function first to initialize necessary attributes!!!
This was intended for advanced users only.
See the following example https://github.com/cjekel/piecewise_linear_fit_py/blob/master/examples/useCustomOptimizationRoutine.py
 Parameters
 n_segmentsint
The x locations where each line segment terminates. These are referred to as breakpoints for each line segment. This should be structured as a 1D numpy array.
 x_cnone or array_like, optional
The x locations of the data points that the piecewise linear function will be forced to go through.
 y_cnone or array_like, optional
The x locations of the data points that the piecewise linear function will be forced to go through.
 Raises
 ValueError
You can’t specify weights with x_c and y_c.
Notes
Optimize fit_with_breaks_opt(var) where var is a 1D array containing the x locations of your variables var has length n_segments  1, because the two breakpoints are always defined (1. the min of x, 2. the max of x).
fit_with_breaks_opt(var) will return the sum of the square of the residuals which you’ll want to minimize with your optimization routine.