Examples ======== All of these examples will use the following data and imports. .. code:: python import numpy as np import matplotlib.pyplot as plt import pwlf # your data y = np.array([0.00000000e+00, 9.69801700e-03, 2.94350340e-02, 4.39052750e-02, 5.45343950e-02, 6.74104940e-02, 8.34831790e-02, 1.02580042e-01, 1.22767939e-01, 1.42172312e-01, 0.00000000e+00, 8.58600000e-06, 8.31543400e-03, 2.34184100e-02, 3.39709150e-02, 4.03581990e-02, 4.53545600e-02, 5.02345260e-02, 5.55253360e-02, 6.14750770e-02, 6.82125120e-02, 7.55892510e-02, 8.38356810e-02, 9.26413070e-02, 1.02039790e-01, 1.11688258e-01, 1.21390666e-01, 1.31196948e-01, 0.00000000e+00, 1.56706510e-02, 3.54628780e-02, 4.63739040e-02, 5.61442590e-02, 6.78542550e-02, 8.16388310e-02, 9.77756110e-02, 1.16531753e-01, 1.37038283e-01, 0.00000000e+00, 1.16951050e-02, 3.12089850e-02, 4.41776550e-02, 5.42877590e-02, 6.63321350e-02, 8.07655920e-02, 9.70363280e-02, 1.15706975e-01, 1.36687642e-01, 0.00000000e+00, 1.50144640e-02, 3.44519970e-02, 4.55907760e-02, 5.59556700e-02, 6.88450940e-02, 8.41374060e-02, 1.01254006e-01, 1.20605073e-01, 1.41881288e-01, 1.62618058e-01]) x = np.array([0.00000000e+00, 8.82678000e-03, 3.25615100e-02, 5.66106800e-02, 7.95549800e-02, 1.00936330e-01, 1.20351520e-01, 1.37442010e-01, 1.51858250e-01, 1.64433570e-01, 0.00000000e+00, -2.12600000e-05, 7.03872000e-03, 1.85494500e-02, 3.00926700e-02, 4.17617000e-02, 5.37279600e-02, 6.54941000e-02, 7.68092100e-02, 8.76596300e-02, 9.80525800e-02, 1.07961810e-01, 1.17305210e-01, 1.26063930e-01, 1.34180360e-01, 1.41725010e-01, 1.48629710e-01, 1.55374770e-01, 0.00000000e+00, 1.65610200e-02, 3.91016100e-02, 6.18679400e-02, 8.30997400e-02, 1.02132890e-01, 1.19011260e-01, 1.34620080e-01, 1.49429370e-01, 1.63539960e-01, -0.00000000e+00, 1.01980300e-02, 3.28642800e-02, 5.59461900e-02, 7.81388400e-02, 9.84458400e-02, 1.16270210e-01, 1.31279040e-01, 1.45437090e-01, 1.59627540e-01, 0.00000000e+00, 1.63404300e-02, 4.00086000e-02, 6.34390200e-02, 8.51085900e-02, 1.04787860e-01, 1.22120350e-01, 1.36931660e-01, 1.50958760e-01, 1.65299640e-01, 1.79942720e-01]) 1. `fit with known breakpoint locations <#fit-with-known-breakpoint-locations>`__ 2. `fit for specified number of line segments <#fit-for-specified-number-of-line-segments>`__ 3. `fitfast for specified number of line segments <#fitfast-for-specified-number-of-line-segments>`__ 4. `force a fit through data points <#force-a-fit-through-data-points>`__ 5. `use custom optimization routine <#use-custom-optimization-routine>`__ 6. `pass differential evolution keywords <#pass-differential-evolution-keywords>`__ 7. `find the best number of line segments <#find-the-best-number-of-line-segments>`__ 8. `model persistence <#model-persistence>`__ 9. `bad fits when you have more unknowns than data <#bad-fits-when-you-have-more-unknowns-than-data>`__ 10. `fit with a breakpoint guess <#fit-with-a-breakpoint-guess>`__ 11. `get the linear regression matrix <#get-the-linear-regression-matrix>`__ 12. `use of TensorFlow <#use-of-tensorflow>`__ 13. `fit constants or polynomials <#fit-constants-or-polynomials>`__ 14. `specify breakpoint bounds <#specify-breakpoint-bounds>`__ 15. `non-linear standard errors and p-values <#non-linear-standard-errors-and-p-values>`__ 16. `obtain the equations of fitted pwlf <#obtain-the-equations-of-fitted-pwlf>`__ 17. `weighted least squares fit <#weighted-least-squares-fit>`__ 18. `reproducible results <#reproducible results>`__ fit with known breakpoint locations ----------------------------------- You can perform a least squares fit if you know the breakpoint locations. .. code:: python # your desired line segment end locations x0 = np.array([min(x), 0.039, 0.10, max(x)]) # initialize piecewise linear fit with your x and y data my_pwlf = pwlf.PiecewiseLinFit(x, y) # fit the data with the specified break points # (ie the x locations of where the line segments # will terminate) my_pwlf.fit_with_breaks(x0) # predict for the determined points xHat = np.linspace(min(x), max(x), num=10000) yHat = my_pwlf.predict(xHat) # plot the results plt.figure() plt.plot(x, y, 'o') plt.plot(xHat, yHat, '-') plt.show() .. figure:: https://raw.githubusercontent.com/cjekel/piecewise_linear_fit_py/master/examples/figs/fit_breaks.png :alt: fit with known breakpoint locations fit with known breakpoint locations fit for specified number of line segments ----------------------------------------- Use a global optimization to find the breakpoint locations that minimize the sum of squares error. This uses `Differential Evolution `__ from scipy. .. code:: python # initialize piecewise linear fit with your x and y data my_pwlf = pwlf.PiecewiseLinFit(x, y) # fit the data for four line segments res = my_pwlf.fit(4) # predict for the determined points xHat = np.linspace(min(x), max(x), num=10000) yHat = my_pwlf.predict(xHat) # plot the results plt.figure() plt.plot(x, y, 'o') plt.plot(xHat, yHat, '-') plt.show() .. figure:: https://raw.githubusercontent.com/cjekel/piecewise_linear_fit_py/master/examples/figs/numberoflines.png :alt: fit for specified number of line segments fit for specified number of line segments fitfast for specified number of line segments --------------------------------------------- This performs a fit for a specified number of line segments with a multi-start gradient based optimization. This should be faster than `Differential Evolution `__ for a small number of starting points. .. code:: python # initialize piecewise linear fit with your x and y data my_pwlf = pwlf.PiecewiseLinFit(x, y) # fit the data for four line segments # this performs 3 multi-start optimizations res = my_pwlf.fitfast(4, pop=3) # predict for the determined points xHat = np.linspace(min(x), max(x), num=10000) yHat = my_pwlf.predict(xHat) # plot the results plt.figure() plt.plot(x, y, 'o') plt.plot(xHat, yHat, '-') plt.show() .. figure:: https://raw.githubusercontent.com/cjekel/piecewise_linear_fit_py/master/examples/figs/fitfast.png :alt: fitfast for specified number of line segments fitfast for specified number of line segments force a fit through data points ------------------------------- Sometimes it’s necessary to force the piecewise continuous model through a particular data point, or a set of data points. The following example finds the best 4 line segments that go through two data points. .. code:: python # initialize piecewise linear fit with your x and y data myPWLF = pwlf.PiecewiseLinFit(x, y) # fit the function with four line segments # force the function to go through the data points # (0.0, 0.0) and (0.19, 0.16) # where the data points are of the form (x, y) x_c = [0.0, 0.19] y_c = [0.0, 0.2] res = myPWLF.fit(4, x_c, y_c) # predict for the determined points xHat = np.linspace(min(x), 0.19, num=10000) yHat = myPWLF.predict(xHat) # plot the results plt.figure() plt.plot(x, y, 'o') plt.plot(xHat, yHat, '-') plt.show() .. figure:: https://raw.githubusercontent.com/cjekel/piecewise_linear_fit_py/master/examples/figs/force.png :alt: force a fit through data points force a fit through data points use custom optimization routine ------------------------------- You can use your favorite optimization routine to find the breakpoint locations. The following example uses scipy’s `minimize `__ function. .. code:: python from scipy.optimize import minimize # initialize piecewise linear fit with your x and y data my_pwlf = pwlf.PiecewiseLinFit(x, y) # initialize custom optimization number_of_line_segments = 3 my_pwlf.use_custom_opt(number_of_line_segments) # i have number_of_line_segments - 1 number of variables # let's guess the correct location of the two unknown variables # (the program defaults to have end segments at x0= min(x) # and xn=max(x) xGuess = np.zeros(number_of_line_segments - 1) xGuess[0] = 0.02 xGuess[1] = 0.10 res = minimize(my_pwlf.fit_with_breaks_opt, xGuess) # set up the break point locations x0 = np.zeros(number_of_line_segments + 1) x0[0] = np.min(x) x0[-1] = np.max(x) x0[1:-1] = res.x # calculate the parameters based on the optimal break point locations my_pwlf.fit_with_breaks(x0) # predict for the determined points xHat = np.linspace(min(x), max(x), num=10000) yHat = my_pwlf.predict(xHat) plt.figure() plt.plot(x, y, 'o') plt.plot(xHat, yHat, '-') plt.show() pass differential evolution keywords ------------------------------------ You can pass keyword arguments from the ``fit`` function into scipy’s `Differential Evolution `__. .. code:: python # initialize piecewise linear fit with your x and y data my_pwlf = pwlf.PiecewiseLinFit(x, y) # fit the data for four line segments # this sets DE to have an absolute tolerance of 0.1 res = my_pwlf.fit(4, atol=0.1) # predict for the determined points xHat = np.linspace(min(x), max(x), num=10000) yHat = my_pwlf.predict(xHat) # plot the results plt.figure() plt.plot(x, y, 'o') plt.plot(xHat, yHat, '-') plt.show() find the best number of line segments ------------------------------------- This example uses EGO (bayesian optimization) and a penalty function to find the best number of line segments. This will require careful use of the penalty parameter ``l``. Use this template to automatically find the best number of line segments. .. code:: python from GPyOpt.methods import BayesianOptimization # initialize piecewise linear fit with your x and y data my_pwlf = pwlf.PiecewiseLinFit(x, y) # define your objective function def my_obj(x): # define some penalty parameter l # you'll have to arbitrarily pick this # it depends upon the noise in your data, # and the value of your sum of square of residuals l = y.mean()*0.001 f = np.zeros(x.shape[0]) for i, j in enumerate(x): my_pwlf.fit(j[0]) f[i] = my_pwlf.ssr + (l*j[0]) return f # define the lower and upper bound for the number of line segments bounds = [{'name': 'var_1', 'type': 'discrete', 'domain': np.arange(2, 40)}] np.random.seed(12121) myBopt = BayesianOptimization(my_obj, domain=bounds, model_type='GP', initial_design_numdata=10, initial_design_type='latin', exact_feval=True, verbosity=True, verbosity_model=False) max_iter = 30 # perform the bayesian optimization to find the optimum number # of line segments myBopt.run_optimization(max_iter=max_iter, verbosity=True) print('\n \n Opt found \n') print('Optimum number of line segments:', myBopt.x_opt) print('Function value:', myBopt.fx_opt) myBopt.plot_acquisition() myBopt.plot_convergence() # perform the fit for the optimum my_pwlf.fit(myBopt.x_opt) # predict for the determined points xHat = np.linspace(min(x), max(x), num=10000) yHat = my_pwlf.predict(xHat) # plot the results plt.figure() plt.plot(x, y, 'o') plt.plot(xHat, yHat, '-') plt.show() model persistence ----------------- You can save fitted models with pickle. Alternatively see `joblib `__. .. code:: python # if you use Python 2.x you should import cPickle # import cPickle as pickle # if you use Python 3.x you can just use pickle import pickle # your desired line segment end locations x0 = np.array([min(x), 0.039, 0.10, max(x)]) # initialize piecewise linear fit with your x and y data my_pwlf = pwlf.PiecewiseLinFit(x, y) # fit the data with the specified break points my_pwlf.fit_with_breaks(x0) # save the fitted model with open('my_fit.pkl', 'wb') as f: pickle.dump(my_pwlf, f, pickle.HIGHEST_PROTOCOL) # load the fitted model with open('my_fit.pkl', 'rb') as f: my_pwlf = pickle.load(f) bad fits when you have more unknowns than data ---------------------------------------------- You can get very bad fits with pwlf when you have more unknowns than data points. The following example will fit 99 line segments to the 59 data points. While this will result in an error of zero, the model will have very weird predictions within the data. You should not fit more unknowns than you have data with pwlf! .. code:: python break_locations = np.linspace(min(x), max(x), num=100) # initialize piecewise linear fit with your x and y data my_pwlf = pwlf.PiecewiseLinFit(x, y) my_pwlf.fit_with_breaks(break_locations) # predict for the determined points xHat = np.linspace(min(x), max(x), num=10000) yHat = my_pwlf.predict(xHat) # plot the results plt.figure() plt.plot(x, y, 'o') plt.plot(xHat, yHat, '-') plt.show() .. figure:: https://raw.githubusercontent.com/cjekel/piecewise_linear_fit_py/master/examples/figs/badfit.png :alt: bad fits when you have more unknowns than data bad fits when you have more unknowns than data fit with a breakpoint guess --------------------------- In this example we see two distinct linear regions, and we believe a breakpoint occurs at 6.0. We’ll use the fit_guess() function to find the best breakpoint location starting with this guess. These fits should be much faster than the ``fit`` or ``fitfast`` function when you have a reasonable idea where the breakpoints occur. .. code:: python import numpy as np import pwlf x = np.array([4., 5., 6., 7., 8.]) y = np.array([11., 13., 16., 28.92, 42.81]) my_pwlf = pwlf.PiecewiseLinFit(x, y) breaks = my_pwlf.fit_guess([6.0]) Note specifying one breakpoint will result in two line segments. If we wanted three line segments, we’ll have to specify two breakpoints. .. code:: python breaks = my_pwlf.fit_guess([5.5, 6.0]) get the linear regression matrix -------------------------------- In some cases it may be desirable to work with the linear regression matrix directly. The following example grabs the linear regression matrix ``A`` for a specific set of breakpoints. In this case we assume that the breakpoints occur at each of the data points. Please see the `paper `__ for details about the regression matrix ``A``. .. code:: python import numpy as np import pwlf # select random seed for reproducibility np.random.seed(123) # generate sin wave data x = np.linspace(0, 10, num=100) y = np.sin(x * np.pi / 2) ytrue = y.copy() # add noise to the data y = np.random.normal(0, 0.05, 100) + ytrue my_pwlf_en = pwlf.PiecewiseLinFit(x, y) # copy the x data to use as break points breaks = my_pwlf_en.x_data.copy() # create the linear regression matrix A A = my_pwlf_en.assemble_regression_matrix(breaks, my_pwlf_en.x_data) We can perform fits that are more complicated than a least squares fit when we have the regression matrix. The following uses the Elastic Net regularizer to perform an interesting fit with the regression matrix. .. code:: python from sklearn.linear_model import ElasticNetCV # set up the elastic net en_model = ElasticNetCV(cv=5, l1_ratio=[.1, .5, .7, .9, .95, .99, 1], fit_intercept=False, max_iter=1000000, n_jobs=-1) # fit the model using the elastic net en_model.fit(A, my_pwlf_en.y_data) # predict from the elastic net parameters xhat = np.linspace(x.min(), x.max(), 1000) yhat_en = my_pwlf_en.predict(xhat, breaks=breaks, beta=en_model.coef_) .. figure:: https://raw.githubusercontent.com/cjekel/piecewise_linear_fit_py/master/examples/figs/sin_en_net_fit.png :alt: interesting elastic net fit interesting elastic net fit use of tensorflow ----------------- You need to install `pwlftf `__ which will have the ``PiecewiseLinFitTF`` class. For performance benchmarks (these benchmarks are outdated! and the regular pwlf may be faster in many applications) see this blog `post `__. The use of the TF class is nearly identical to the original class, however note the following exceptions. ``PiecewiseLinFitTF`` does: - not have a ``lapack_driver`` option - have an optional parameter ``dtype``, so you can choose between the float64 and float32 data types - have an optional parameter ``fast`` to switch between Cholesky decomposition (default ``fast=True``), and orthogonal decomposition (``fast=False``) .. code:: python import pwlftf as pwlf # your desired line segment end locations x0 = np.array([min(x), 0.039, 0.10, max(x)]) # initialize TF piecewise linear fit with your x and y data my_pwlf = pwlf.PiecewiseLinFitTF(x, y, dtype='float32) # fit the data with the specified break points # (ie the x locations of where the line segments # will terminate) my_pwlf.fit_with_breaks(x0) # predict for the determined points xHat = np.linspace(min(x), max(x), num=10000) yHat = my_pwlf.predict(xHat) fit constants or polynomials ---------------------------- You can use pwlf to fit segmented constant models, or piecewise polynomials. The following example fits a segmented constant model, piecewise linear, and a piecewise quadratic model to a sine wave. .. code:: python # generate sin wave data x = np.linspace(0, 10, num=100) y = np.sin(x * np.pi / 2) # add noise to the data y = np.random.normal(0, 0.05, 100) + y # initialize piecewise linear fit with your x and y data # pwlf lets you fit continuous model for many degree polynomials # degree=0 constant # degree=1 linear (default) # degree=2 quadratic my_pwlf_0 = pwlf.PiecewiseLinFit(x, y, degree=0) my_pwlf_1 = pwlf.PiecewiseLinFit(x, y, degree=1) # default my_pwlf_2 = pwlf.PiecewiseLinFit(x, y, degree=2) # fit the data for four line segments res0 = my_pwlf_0.fitfast(5, pop=50) res1 = my_pwlf_1.fitfast(5, pop=50) res2 = my_pwlf_2.fitfast(5, pop=50) # predict for the determined points xHat = np.linspace(min(x), max(x), num=10000) yHat0 = my_pwlf_0.predict(xHat) yHat1 = my_pwlf_1.predict(xHat) yHat2 = my_pwlf_2.predict(xHat) # plot the results plt.figure() plt.plot(x, y, 'o', label='Data') plt.plot(xHat, yHat0, '-', label='degree=0') plt.plot(xHat, yHat1, '--', label='degree=1') plt.plot(xHat, yHat2, ':', label='degree=2') plt.legend() plt.show() .. figure:: https://raw.githubusercontent.com/cjekel/piecewise_linear_fit_py/master/examples/figs/multi_degree.png :alt: Example of multiple degree fits to a sine wave. Example of multiple degree fits to a sine wave. specify breakpoint bounds ------------------------- You may want extra control over the search space for feasible breakpoints. One way to do this is to specify the bounds for each breakpoint location. .. code:: python # generate sin wave data x = np.linspace(0, 10, num=100) y = np.sin(x * np.pi / 2) # add noise to the data y = np.random.normal(0, 0.05, 100) + y # initialize piecewise linear fit with your x and y data my_pwlf = pwlf.PiecewiseLinFit(x, y) # define custom bounds for the interior break points n_segments = 4 bounds = np.zeros((n_segments-1, 2)) # first breakpoint bounds[0, 0] = 0.0 # lower bound bounds[0, 1] = 3.5 # upper bound # second breakpoint bounds[1, 0] = 3.0 # lower bound bounds[1, 1] = 7.0 # upper bound # third breakpoint bounds[2, 0] = 6.0 # lower bound bounds[2, 1] = 10.0 # upper bound res = my_pwlf.fit(n_segments, bounds=bounds) non-linear standard errors and p-values --------------------------------------- You can calculate non-linear standard errors using the Delta method. This will calculate the standard errors of the piecewise linear parameters (intercept + slopes) and the breakpoint locations! First let us generate true piecewise linear data. .. code:: python from __future__ import print_function # generate a true piecewise linear data np.random.seed(5) n_data = 100 x = np.linspace(0, 1, num=n_data) y = np.random.random(n_data) my_pwlf_gen = pwlf.PiecewiseLinFit(x, y) true_beta = np.random.normal(size=5) true_breaks = np.array([0.0, 0.2, 0.5, 0.75, 1.0]) y = my_pwlf_gen.predict(x, beta=true_beta, breaks=true_breaks) plt.figure() plt.title('True piecewise linear data') plt.plot(x, y) plt.show() .. figure:: https://raw.githubusercontent.com/cjekel/piecewise_linear_fit_py/master/examples/figs/true_pwlf.png :alt: True piecewise linear data. True piecewise linear data. Now we can perform a fit, calculate the standard errors, and p-values. The non-linear method uses a first order taylor series expansion to linearize the non-linear regression problem. A positive step_size performs a forward difference, and a negative step_size would perform a backwards difference. .. code:: python my_pwlf = pwlf.PiecewiseLinFit(x, y) res = my_pwlf.fitfast(4, pop=100) p = my_pwlf.p_values(method='non-linear', step_size=1e-4) se = my_pwlf.se # standard errors The standard errors and p-values correspond to each model parameter. First the beta parameters (intercept + slopes) and then the breakpoints. We can assemble the parameters, and print a table of the result with the following code. .. code:: python parameters = np.concatenate((my_pwlf.beta, my_pwlf.fit_breaks[1:-1])) header = ['Parameter type', 'Parameter value', 'Standard error', 't', 'P > np.abs(t) (p-value)'] print(*header, sep=' | ') values = np.zeros((parameters.size, 5), dtype=np.object_) values[:, 1] = np.around(parameters, decimals=3) values[:, 2] = np.around(se, decimals=3) values[:, 3] = np.around(parameters / se, decimals=3) values[:, 4] = np.around(p, decimals=3) for i, row in enumerate(values): if i < my_pwlf.beta.size: row[0] = 'Beta' print(*row, sep=' | ') else: row[0] = 'Breakpoint' print(*row, sep=' | ') ============== =============== ============== ============== ======================= Parameter type Parameter value Standard error t P > np.abs(t) (p-value) ============== =============== ============== ============== ======================= Beta 1.821 0.0 1763191476.046 0.0 Beta -0.427 0.0 -46404554.493 0.0 Beta -1.165 0.0 -111181494.162 0.0 Beta -1.397 0.0 -168954500.421 0.0 Beta 0.873 0.0 93753841.242 0.0 Breakpoint 0.2 0.0 166901856.885 0.0 Breakpoint 0.5 0.0 537785803.646 0.0 Breakpoint 0.75 0.0 482311769.159 0.0 ============== =============== ============== ============== ======================= obtain the equations of fitted pwlf ----------------------------------- Sometimes you may want the mathematical equations that represent your fitted model. This is easy to perform if you don’t mind using sympy. The following code will fit 5 line segments of degree=2 to a sin wave. .. code:: python import numpy as np import pwlf # generate sin wave data x = np.linspace(0, 10, num=100) y = np.sin(x * np.pi / 2) # add noise to the data y = np.random.normal(0, 0.05, 100) + y my_pwlf_2 = pwlf.PiecewiseLinFit(x, y, degree=2) res2 = my_pwlf_2.fitfast(5, pop=50) Given this fit, the following code will print the mathematical equation for each line segment. .. code:: python from sympy import Symbol from sympy.utilities import lambdify x = Symbol('x') def get_symbolic_eqn(pwlf_, segment_number): if pwlf_.degree < 1: raise ValueError('Degree must be at least 1') if segment_number < 1 or segment_number > pwlf_.n_segments: raise ValueError('segment_number not possible') # assemble degree = 1 first for line in range(segment_number): if line == 0: my_eqn = pwlf_.beta[0] + (pwlf_.beta[1])*(x-pwlf_.fit_breaks[0]) else: my_eqn += (pwlf_.beta[line+1])*(x-pwlf_.fit_breaks[line]) # assemble all other degrees if pwlf_.degree > 1: for k in range(2, pwlf_.degree + 1): for line in range(segment_number): beta_index = pwlf_.n_segments*(k-1) + line + 1 my_eqn += (pwlf_.beta[beta_index])*(x-pwlf_.fit_breaks[line])**k return my_eqn.simplify() eqn_list = [] f_list = [] for i in range(my_pwlf_2.n_segments): eqn_list.append(get_symbolic_eqn(my_pwlf_2, i + 1)) print('Equation number: ', i + 1) print(eqn_list[-1]) f_list.append(lambdify(x, eqn_list[-1])) which should print out something like the following: .. code:: python Equation number: 1 -0.953964059782599*x**2 + 1.89945177490653*x + 0.00538634182565454 Equation number: 2 0.951561315686298*x**2 - 5.69747505830914*x + 7.5772216545711 Equation number: 3 -0.949735350431857*x**2 + 9.48218236957122*x - 22.720785454735 Equation number: 4 0.926850298824217*x**2 - 12.9824424358344*x + 44.5102742956827 Equation number: 5 -1.03016230425747*x**2 + 18.5306546317065*x - 82.3508513333073 For more information on how this works, see `this `__ jupyter notebook. weighted least squares fit -------------------------- Sometimes your data will not have a constant variance (heteroscedasticity), and you need to perform a weighted least squares fit. The following example will perform a standard and weighted fit so you can compare the differences. First we need to generate a data set which will be a good candidate to use for weighted least squares fits. .. code:: python # generate data with heteroscedasticity n = 100 n_data_sets = 100 n_segments = 6 # generate sine data x = np.linspace(0, 10, n) y = np.zeros((n_data_sets, n)) sigma_change = np.linspace(0.001, 0.05, 100) for i in range(n_data_sets): y[i] = np.sin(x * np.pi / 2) # add noise to the data y[i] = np.random.normal(0, sigma_change, 100) + y[i] X = np.tile(x, n_data_sets) The individual weights in pwlf are the reciprocal of the standard deviation for each data point. Here weights[i] corresponds to one over the standard deviation of the ith data point. The result of this is that data points with higher variance are less important to the overall pwlf than data point with small variance. Let’s perform a standard pwlf fit and a weighted fit. .. code:: python # perform an ordinary pwlf fit to the entire data my_pwlf = pwlf.PiecewiseLinFit(X.flatten(), y.flatten()) my_pwlf.fit(n_segments) # compute the standard deviation in y y_std = np.std(y, axis=0) # set the weights to be one over the standard deviation weights = 1.0 / y_std # perform a weighted least squares to the data my_pwlf_w = pwlf.PiecewiseLinFit(x, y.mean(axis=0), weights=weights) my_pwlf_w.fit(n_segments) # compare the fits xhat = np.linspace(0, 10, 1000) yhat = my_pwlf.predict(xhat) yhat_w = my_pwlf_w.predict(xhat) plt.figure() plt.plot(X.flatten(), y.flatten(), '.') plt.plot(xhat, yhat, '-', label='Ordinary LS') plt.plot(xhat, yhat_w, '-', label='Weighted LS') plt.legend() plt.show() .. figure:: https://raw.githubusercontent.com/cjekel/piecewise_linear_fit_py/master/examples/weighted_least_squares_example.png :alt: Weighted pwlf fit. Weighted pwlf fit. We can see that the weighted pwlf fit tries fit data with low variance better than data with high variance, however the ordinary pwlf fits the data assuming a uniform variance. reproducible results -------------------- The `fit` and `fitfast` methods are stochastic and may not give the same result every time the program is run. To have reproducible results you can manually specify a numpy.random.seed on init. Now everytime the following program is run, the results of the fit(2) should be the same. .. code:: python # initialize piecewise linear fit with a random seed my_pwlf = pwlf.PiecewiseLinFit(x, y, seed=123) # Now the fit() method will be reproducible my_pwlf.fit(2)