Module toleranceinterval.twoside.twoside

Expand source code
# -- coding: utf-8 --
# MIT License
#
# Copyright (c) 2019 Charles Jekel
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

import numpy as np
from ..checks import numpy_array, assert_2d_sort
from . import _normal_exact as exact
from . import _normal_approx as approx


def normal_factor(n, p, g, method=None, m=None, nu=None, d2=None,
                  simultaneous=False, tailprob=False):
    r"""
    Compute two-sided central tolerance factor using the normal distribution.

    Computes the tolerance factor k for the two-sided central tolerance
    interval (TI) to cover a proportion p of the population with confidence g:

        TI = [Xmean - k * S, Xmean + k * S]

    where Xmean = mean(X), S = std(X), X = [X_1,...,X_n] is a random sample
    of size n from the distribution N(mu,sig2) with unknown mean mu and
    variance sig2.

    The tolerance factor k is determined such that the tolerance intervals
    with confidence g cover at least the coverage fraction
    of the distribution N(mu,sigma^2), i.e.
    Prob[ Prob( Xmean - k * S < X < Xmean + k * S ) >= p ] = g,
    for X ~ N(mu,sig2) which is independent of Xmean and S.

    By default, this function uses an 'exact' method for computing the factor
    by Gauss-Kronod quadrature as described in the references [1,2,4]. There
    are also two approximate methods implemented: the 'howe' method as
    described in [5], and the 'guenther' method as described in [6]. A brief
    overview of both approximate methods can be found at NIST's website:
    https://www.itl.nist.gov/div898/handbook/prc/section2/prc263.htm

    Additional optional parameters are available to consider pooled variance
    studies when m random samples of size n are available. Furthermore,
    for the 'exact' method, optional parameters are available to
    consider simultaneous tolerance intervals as described in [7,8].
    If S is a pooled estimator of sig, based on m random samples of size n,
    normal_factor computes the tolerance factor k for the two-sided p-content
    and g-confidence tolerance intervals

        TI = [Xmean_i - k * S, Xmean_i + k * S], for i = 1,...,m

    where Xmean_i = mean(X_i), X_i = [X_i1,...,X_in] is a random sample of
    size n from the distribution N(mu_i,sig2) with unknown mean mu_i and
    variance sig2, and S = sqrt(S2), S2 is the pooled estimator of sig2,

        S2 = (1/nu) * sum_i=1:m ( sum_j=1:n (X_ij - Xmean_i)^2 )

    with nu degrees of freedom, nu = m * (n-1). For the 'exact' method, both
    the simultaneous and non-simultaneous cases can be considered.

    Parameters
    ----------
    n : scalar
        Sample size
    p : scalar in the interval [0.0, 1.0]
        Coverage (or content) probability,
        Prob( Xmean - k * S < X < Xmean + k * S ) >= p
    g : scalar in the interval [0.0, 1.0]
        Confidence probability,
        Prob[ Prob( Xmean-k*S < X < Xmean+k*S ) >= p ] = g.
    method : str
        Method to use for computing the factor. Available methods are 'exact',
        'howe', and 'guenther'. If None, the default method is 'exact'.
    m : scalar
        Number of independent random samples (of size n). If None,
        default value is m = 1.
    nu : scalar
        Degrees of freedom for distribution of the (pooled) sample
        variance S2. If None, default value is nu = m*(n-1).
    d2 : scalar
        Normalizing constant. For computing the factors of the
        non-simultaneous tolerance limits (xx'*betaHat +/- k * S)
        for the linear regression y = XX*beta +epsilon, set d2 =
        xx'*inv(XX'*XX)*xx.
        Typically, in simple linear regression the estimator S2 has
        nu = n-2 degrees of freedom. If None, default value is d2 = 1/n.
    simultaneous : boolean
        Logical flag for calculating the factor for
        simultaneous tolerance intervals. If False, normal_factor will
        calculate the factor for the non-simultaneous tolerance interval.
        Default value is False.
    tailprob : boolean
        Logical flag for representing the input probabilities
        'p' and 'g'. If True, the input parameters are
        represented as the tail coverage (i.e. 1 - p) and tail confidence
        (i.e. 1 - g). This option is useful if the interest is to
        calculate the tolerance factor for extremely large values
        of coverage and/or confidence, close to 1, as
        e.g. coverage = 1 - 1e-18. Default value is False.

    Returns
    -------
    float
        The calculated tolerance factor for the tolerance interval.

    References
    ----------
    [1] Krishnamoorthy K, Mathew T. (2009). Statistical Tolerance Regions:
        Theory, Applications, and Computation. John Wiley & Sons, Inc.,
        Hoboken, New Jersey. ISBN: 978-0-470-38026-0, 512 pages.

    [2] Witkovsky V. On the exact two-sided tolerance intervals for
        univariate normal distribution and linear regression. Austrian
        Journal of Statistics 43(4), 2014, 279-92.
        http://ajs.data-analysis.at/index.php/ajs/article/viewFile/vol43-4-6/35

    [3] ISO 16269-6:2014: Statistical interpretation of data - Part 6:
        Determination of statistical tolerance intervals.

    [4] Janiga I., Garaj I.: Two-sided tolerance limits of normal
        distributions with unknown means and unknown common variability.
        MEASUREMENT SCIENCE REVIEW, Volume 3, Section 1, 2003, 75-78.

    [5] Howe, W. G. “Two-Sided Tolerance Limits for Normal Populations,
        Some Improvements.” Journal of the American Statistical Association,
        vol. 64, no. 326, [American Statistical Association, Taylor & Francis,
        Ltd.], 1969, pp. 610–20, https://doi.org/10.2307/2283644.

    [6] Guenther, W. C. (1977). "Sampling Inspection in Statistical Quality
        Control",, Griffin's Statistical Monographs, Number 37, London.

    [7] Robert W. Mee (1990) Simultaneous Tolerance Intervals for Normal
        Populations With Common Variance, Technometrics, 32:1, 83-92,
        DOI: 10.1080/00401706.1990.10484595

    [8] K. Krishnamoorthy & Saptarshi Chakraberty (2022). Construction of
        simultaneous tolerance intervals for several normal distributions,
        Journal of Statistical Computation and Simulation, 92:1, 101-114,
        DOI: 10.1080/00949655.2021.1932885

    """
    # Handle default method:
    if method is None:
        method = 'exact'

    if method == 'exact':
        k = exact.tolerance_factor(n, p, g, m, nu, d2, simultaneous, tailprob)
    elif method == 'howe':
        k = approx.tolerance_factor_howe(n, p, g, m, nu)
    elif method == 'guenther':
        k = approx.tolerance_factor_guenther(n, p, g, m, nu)
    else:
        raise ValueError(
            "Invalid method requested. Valid methods are 'exact', 'howe', or "
            "'guenther'."
        )
    return k


def normal(x, p, g, method=None, pool_variance=False):
    r"""
    Compute two-sided central tolerance interval using the normal distribution.

    Computes the two-sided tolerance interval (TI) to cover a proportion p of
    the population with confidence g using the normal distribution. This
    follows the standard approach to calculate the interval as a factor of
    sample standard deviations away from the sample mean.

        TI = [Xmean - k * S, Xmean + k * S]

    where Xmean = mean(X), S = std(X), X = [X_1,...,X_n] is a random sample
    of size n from the distribution N(mu,sig2) with unknown mean mu and
    variance sig2.

    By default, this function uses an 'exact' method for computing the TI
    by Gauss-Kronod quadrature. There are also two approximate methods
    implemented: the 'howe' method, and the 'guenther' method. See the
    documentation for normal_factor for more details on the methods.

    Parameters
    ----------
    x : ndarray (1-D, or 2-D)
        Numpy array of samples to compute the tolerance interval. Assumed data
        type is np.float. Shape of (m, n) is assumed for 2-D arrays with m
        number of sets of sample size n.
    p : float
        Percentile for central TI to cover.
    g : float
        Confidence level where g > 0. and g < 1.
    method : str
        Method to use for computing the TI. Available methods are 'exact',
        'howe', and 'guenther'. If None, the default method is 'exact'.
    pool_variance : boolean
        Consider the m random samples to share the same variance such that
        the degrees of freedom are nu = m*(n-1). Default is False.

    Returns
    -------
    ndarray (2-D)
        The normal distribution toleranace interval bound. Shape (m, 2) from m
        sets of samples, where [:, 0] is the lower bound and [:, 1] is the
        upper bound.

    References
    ----------
    See the documentation for normal_factor for a complete list of references.

    Examples
    --------
    Estimate the 90th percentile central TI with 95% confidence of the
    following 100 random samples from a normal distribution.

    >>> import numpy as np
    >>> import toleranceinterval as ti
    >>> x = np.random.nomral(100)
    >>> bound = ti.twoside.normal(x, 0.9, 0.95)
    >>> print('Lower bound:', bound[:, 0])
    >>> print('Upper bound:', bound[:, 1])

    Estimate the 95th percentile central TI with 95% confidence of the
    following 100 random samples from a normal distribution.

    >>> bound = ti.twoside.normal(x, 0.95, 0.95)

    """
    x = numpy_array(x)  # check if numpy array, if not make numpy array
    x = assert_2d_sort(x)
    m, n = x.shape

    # Handle pooled variance case
    if pool_variance:
        _m = m
    else:
        _m = 1

    k = normal_factor(n, p, g, method, _m)
    bound = np.zeros((m, 2))
    xmu = x.mean(axis=1)
    kstd = k * x.std(axis=1, ddof=1)
    bound[:, 0] = xmu - kstd
    bound[:, 1] = xmu + kstd
    return bound


def lognormal(x, p, g, method=None, pool_variance=False):
    r"""
    Two-sided central tolerance interval using the lognormal distribution.

    Computes the two-sided tolerance interval using the lognormal distribution.
    This just performs a ln and exp transformations of the normal distribution.

    Parameters
    ----------
    x : ndarray (1-D, or 2-D)
        Numpy array of samples to compute the tolerance interval. Assumed data
        type is np.float. Shape of (m, n) is assumed for 2-D arrays with m
        number of sets of sample size n.
    p : float
        Percentile for central TI to estimate.
    g : float
        Confidence level where g > 0. and g < 1.
    method : str
        Method to use for computing the TI. Available methods are 'exact',
        'howe', and 'guenther'. If None, the default method is 'exact'.
    pool_variance : boolean
        Consider the m random samples to share the same variance such that
        the degrees of freedom are nu = m*(n-1). Default is False.

    Returns
    -------
    ndarray (2-D)
        The lognormal distribution toleranace interval bound. Shape (m, 2)
        from m sets of samples, where [:, 0] is the lower bound and [:, 1] is
        the upper bound.

    Examples
    --------
    Estimate the 90th percentile central TI with 95% confidence of the
    following 100 random samples from a lognormal distribution.

    >>> import numpy as np
    >>> import toleranceinterval as ti
    >>> x = np.random.random(100)
    >>> bound = ti.twoside.lognormal(x, 0.9, 0.95)
    >>> print('Lower bound:', bound[:, 0])
    >>> print('Upper bound:', bound[:, 1])

    Estimate the 95th percentile central TI with 95% confidence of the
    following 100 random samples from a normal distribution.

    >>> bound = ti.twoside.lognormal(x, 0.95, 0.95)

    """
    x = numpy_array(x)  # check if numpy array, if not make numpy array
    x = assert_2d_sort(x)
    return np.exp(normal(np.log(x), p, g, method, pool_variance))

Functions

def lognormal(x, p, g, method=None, pool_variance=False)

Two-sided central tolerance interval using the lognormal distribution.

Computes the two-sided tolerance interval using the lognormal distribution. This just performs a ln and exp transformations of the normal distribution.

Parameters

x : ndarray (1-D, or 2-D)
Numpy array of samples to compute the tolerance interval. Assumed data type is np.float. Shape of (m, n) is assumed for 2-D arrays with m number of sets of sample size n.
p : float
Percentile for central TI to estimate.
g : float
Confidence level where g > 0. and g < 1.
method : str
Method to use for computing the TI. Available methods are 'exact', 'howe', and 'guenther'. If None, the default method is 'exact'.
pool_variance : boolean
Consider the m random samples to share the same variance such that the degrees of freedom are nu = m*(n-1). Default is False.

Returns

ndarray (2-D)
The lognormal distribution toleranace interval bound. Shape (m, 2) from m sets of samples, where [:, 0] is the lower bound and [:, 1] is the upper bound.

Examples

Estimate the 90th percentile central TI with 95% confidence of the following 100 random samples from a lognormal distribution.

>>> import numpy as np
>>> import toleranceinterval as ti
>>> x = np.random.random(100)
>>> bound = ti.twoside.lognormal(x, 0.9, 0.95)
>>> print('Lower bound:', bound[:, 0])
>>> print('Upper bound:', bound[:, 1])

Estimate the 95th percentile central TI with 95% confidence of the following 100 random samples from a normal distribution.

>>> bound = ti.twoside.lognormal(x, 0.95, 0.95)
Expand source code
def lognormal(x, p, g, method=None, pool_variance=False):
    r"""
    Two-sided central tolerance interval using the lognormal distribution.

    Computes the two-sided tolerance interval using the lognormal distribution.
    This just performs a ln and exp transformations of the normal distribution.

    Parameters
    ----------
    x : ndarray (1-D, or 2-D)
        Numpy array of samples to compute the tolerance interval. Assumed data
        type is np.float. Shape of (m, n) is assumed for 2-D arrays with m
        number of sets of sample size n.
    p : float
        Percentile for central TI to estimate.
    g : float
        Confidence level where g > 0. and g < 1.
    method : str
        Method to use for computing the TI. Available methods are 'exact',
        'howe', and 'guenther'. If None, the default method is 'exact'.
    pool_variance : boolean
        Consider the m random samples to share the same variance such that
        the degrees of freedom are nu = m*(n-1). Default is False.

    Returns
    -------
    ndarray (2-D)
        The lognormal distribution toleranace interval bound. Shape (m, 2)
        from m sets of samples, where [:, 0] is the lower bound and [:, 1] is
        the upper bound.

    Examples
    --------
    Estimate the 90th percentile central TI with 95% confidence of the
    following 100 random samples from a lognormal distribution.

    >>> import numpy as np
    >>> import toleranceinterval as ti
    >>> x = np.random.random(100)
    >>> bound = ti.twoside.lognormal(x, 0.9, 0.95)
    >>> print('Lower bound:', bound[:, 0])
    >>> print('Upper bound:', bound[:, 1])

    Estimate the 95th percentile central TI with 95% confidence of the
    following 100 random samples from a normal distribution.

    >>> bound = ti.twoside.lognormal(x, 0.95, 0.95)

    """
    x = numpy_array(x)  # check if numpy array, if not make numpy array
    x = assert_2d_sort(x)
    return np.exp(normal(np.log(x), p, g, method, pool_variance))
def normal(x, p, g, method=None, pool_variance=False)

Compute two-sided central tolerance interval using the normal distribution.

Computes the two-sided tolerance interval (TI) to cover a proportion p of the population with confidence g using the normal distribution. This follows the standard approach to calculate the interval as a factor of sample standard deviations away from the sample mean.

TI = [Xmean - k * S, Xmean + k * S]

where Xmean = mean(X), S = std(X), X = [X_1,…,X_n] is a random sample of size n from the distribution N(mu,sig2) with unknown mean mu and variance sig2.

By default, this function uses an 'exact' method for computing the TI by Gauss-Kronod quadrature. There are also two approximate methods implemented: the 'howe' method, and the 'guenther' method. See the documentation for normal_factor for more details on the methods.

Parameters

x : ndarray (1-D, or 2-D)
Numpy array of samples to compute the tolerance interval. Assumed data type is np.float. Shape of (m, n) is assumed for 2-D arrays with m number of sets of sample size n.
p : float
Percentile for central TI to cover.
g : float
Confidence level where g > 0. and g < 1.
method : str
Method to use for computing the TI. Available methods are 'exact', 'howe', and 'guenther'. If None, the default method is 'exact'.
pool_variance : boolean
Consider the m random samples to share the same variance such that the degrees of freedom are nu = m*(n-1). Default is False.

Returns

ndarray (2-D)
The normal distribution toleranace interval bound. Shape (m, 2) from m sets of samples, where [:, 0] is the lower bound and [:, 1] is the upper bound.

References

See the documentation for normal_factor for a complete list of references.

Examples

Estimate the 90th percentile central TI with 95% confidence of the following 100 random samples from a normal distribution.

>>> import numpy as np
>>> import toleranceinterval as ti
>>> x = np.random.nomral(100)
>>> bound = ti.twoside.normal(x, 0.9, 0.95)
>>> print('Lower bound:', bound[:, 0])
>>> print('Upper bound:', bound[:, 1])

Estimate the 95th percentile central TI with 95% confidence of the following 100 random samples from a normal distribution.

>>> bound = ti.twoside.normal(x, 0.95, 0.95)
Expand source code
def normal(x, p, g, method=None, pool_variance=False):
    r"""
    Compute two-sided central tolerance interval using the normal distribution.

    Computes the two-sided tolerance interval (TI) to cover a proportion p of
    the population with confidence g using the normal distribution. This
    follows the standard approach to calculate the interval as a factor of
    sample standard deviations away from the sample mean.

        TI = [Xmean - k * S, Xmean + k * S]

    where Xmean = mean(X), S = std(X), X = [X_1,...,X_n] is a random sample
    of size n from the distribution N(mu,sig2) with unknown mean mu and
    variance sig2.

    By default, this function uses an 'exact' method for computing the TI
    by Gauss-Kronod quadrature. There are also two approximate methods
    implemented: the 'howe' method, and the 'guenther' method. See the
    documentation for normal_factor for more details on the methods.

    Parameters
    ----------
    x : ndarray (1-D, or 2-D)
        Numpy array of samples to compute the tolerance interval. Assumed data
        type is np.float. Shape of (m, n) is assumed for 2-D arrays with m
        number of sets of sample size n.
    p : float
        Percentile for central TI to cover.
    g : float
        Confidence level where g > 0. and g < 1.
    method : str
        Method to use for computing the TI. Available methods are 'exact',
        'howe', and 'guenther'. If None, the default method is 'exact'.
    pool_variance : boolean
        Consider the m random samples to share the same variance such that
        the degrees of freedom are nu = m*(n-1). Default is False.

    Returns
    -------
    ndarray (2-D)
        The normal distribution toleranace interval bound. Shape (m, 2) from m
        sets of samples, where [:, 0] is the lower bound and [:, 1] is the
        upper bound.

    References
    ----------
    See the documentation for normal_factor for a complete list of references.

    Examples
    --------
    Estimate the 90th percentile central TI with 95% confidence of the
    following 100 random samples from a normal distribution.

    >>> import numpy as np
    >>> import toleranceinterval as ti
    >>> x = np.random.nomral(100)
    >>> bound = ti.twoside.normal(x, 0.9, 0.95)
    >>> print('Lower bound:', bound[:, 0])
    >>> print('Upper bound:', bound[:, 1])

    Estimate the 95th percentile central TI with 95% confidence of the
    following 100 random samples from a normal distribution.

    >>> bound = ti.twoside.normal(x, 0.95, 0.95)

    """
    x = numpy_array(x)  # check if numpy array, if not make numpy array
    x = assert_2d_sort(x)
    m, n = x.shape

    # Handle pooled variance case
    if pool_variance:
        _m = m
    else:
        _m = 1

    k = normal_factor(n, p, g, method, _m)
    bound = np.zeros((m, 2))
    xmu = x.mean(axis=1)
    kstd = k * x.std(axis=1, ddof=1)
    bound[:, 0] = xmu - kstd
    bound[:, 1] = xmu + kstd
    return bound
def normal_factor(n, p, g, method=None, m=None, nu=None, d2=None, simultaneous=False, tailprob=False)

Compute two-sided central tolerance factor using the normal distribution.

Computes the tolerance factor k for the two-sided central tolerance interval (TI) to cover a proportion p of the population with confidence g:

TI = [Xmean - k * S, Xmean + k * S]

where Xmean = mean(X), S = std(X), X = [X_1,…,X_n] is a random sample of size n from the distribution N(mu,sig2) with unknown mean mu and variance sig2.

The tolerance factor k is determined such that the tolerance intervals with confidence g cover at least the coverage fraction of the distribution N(mu,sigma^2), i.e. Prob[ Prob( Xmean - k * S < X < Xmean + k * S ) >= p ] = g, for X ~ N(mu,sig2) which is independent of Xmean and S.

By default, this function uses an 'exact' method for computing the factor by Gauss-Kronod quadrature as described in the references [1,2,4]. There are also two approximate methods implemented: the 'howe' method as described in [5], and the 'guenther' method as described in [6]. A brief overview of both approximate methods can be found at NIST's website: https://www.itl.nist.gov/div898/handbook/prc/section2/prc263.htm

Additional optional parameters are available to consider pooled variance studies when m random samples of size n are available. Furthermore, for the 'exact' method, optional parameters are available to consider simultaneous tolerance intervals as described in [7,8]. If S is a pooled estimator of sig, based on m random samples of size n, normal_factor computes the tolerance factor k for the two-sided p-content and g-confidence tolerance intervals

TI = [Xmean_i - k * S, Xmean_i + k * S], for i = 1,...,m

where Xmean_i = mean(X_i), X_i = [X_i1,…,X_in] is a random sample of size n from the distribution N(mu_i,sig2) with unknown mean mu_i and variance sig2, and S = sqrt(S2), S2 is the pooled estimator of sig2,

S2 = (1/nu) * sum_i=1:m ( sum_j=1:n (X_ij - Xmean_i)^2 )

with nu degrees of freedom, nu = m * (n-1). For the 'exact' method, both the simultaneous and non-simultaneous cases can be considered.

Parameters

n : scalar
Sample size
p : scalar in the interval [0.0, 1.0]
Coverage (or content) probability, Prob( Xmean - k * S < X < Xmean + k * S ) >= p
g : scalar in the interval [0.0, 1.0]
Confidence probability, Prob[ Prob( Xmean-kS < X < Xmean+kS ) >= p ] = g.
method : str
Method to use for computing the factor. Available methods are 'exact', 'howe', and 'guenther'. If None, the default method is 'exact'.
m : scalar
Number of independent random samples (of size n). If None, default value is m = 1.
nu : scalar
Degrees of freedom for distribution of the (pooled) sample variance S2. If None, default value is nu = m*(n-1).
d2 : scalar
Normalizing constant. For computing the factors of the non-simultaneous tolerance limits (xx'betaHat +/- k * S) for the linear regression y = XXbeta +epsilon, set d2 = xx'inv(XX'XX)*xx. Typically, in simple linear regression the estimator S2 has nu = n-2 degrees of freedom. If None, default value is d2 = 1/n.
simultaneous : boolean
Logical flag for calculating the factor for simultaneous tolerance intervals. If False, normal_factor will calculate the factor for the non-simultaneous tolerance interval. Default value is False.
tailprob : boolean
Logical flag for representing the input probabilities 'p' and 'g'. If True, the input parameters are represented as the tail coverage (i.e. 1 - p) and tail confidence (i.e. 1 - g). This option is useful if the interest is to calculate the tolerance factor for extremely large values of coverage and/or confidence, close to 1, as e.g. coverage = 1 - 1e-18. Default value is False.

Returns

float
The calculated tolerance factor for the tolerance interval.

References

[1] Krishnamoorthy K, Mathew T. (2009). Statistical Tolerance Regions: Theory, Applications, and Computation. John Wiley & Sons, Inc., Hoboken, New Jersey. ISBN: 978-0-470-38026-0, 512 pages.

[2] Witkovsky V. On the exact two-sided tolerance intervals for univariate normal distribution and linear regression. Austrian Journal of Statistics 43(4), 2014, 279-92. http://ajs.data-analysis.at/index.php/ajs/article/viewFile/vol43-4-6/35

[3] ISO 16269-6:2014: Statistical interpretation of data - Part 6: Determination of statistical tolerance intervals.

[4] Janiga I., Garaj I.: Two-sided tolerance limits of normal distributions with unknown means and unknown common variability. MEASUREMENT SCIENCE REVIEW, Volume 3, Section 1, 2003, 75-78.

[5] Howe, W. G. “Two-Sided Tolerance Limits for Normal Populations, Some Improvements.” Journal of the American Statistical Association, vol. 64, no. 326, [American Statistical Association, Taylor & Francis, Ltd.], 1969, pp. 610–20, https://doi.org/10.2307/2283644.

[6] Guenther, W. C. (1977). "Sampling Inspection in Statistical Quality Control",, Griffin's Statistical Monographs, Number 37, London.

[7] Robert W. Mee (1990) Simultaneous Tolerance Intervals for Normal Populations With Common Variance, Technometrics, 32:1, 83-92, DOI: 10.1080/00401706.1990.10484595

[8] K. Krishnamoorthy & Saptarshi Chakraberty (2022). Construction of simultaneous tolerance intervals for several normal distributions, Journal of Statistical Computation and Simulation, 92:1, 101-114, DOI: 10.1080/00949655.2021.1932885

Expand source code
def normal_factor(n, p, g, method=None, m=None, nu=None, d2=None,
                  simultaneous=False, tailprob=False):
    r"""
    Compute two-sided central tolerance factor using the normal distribution.

    Computes the tolerance factor k for the two-sided central tolerance
    interval (TI) to cover a proportion p of the population with confidence g:

        TI = [Xmean - k * S, Xmean + k * S]

    where Xmean = mean(X), S = std(X), X = [X_1,...,X_n] is a random sample
    of size n from the distribution N(mu,sig2) with unknown mean mu and
    variance sig2.

    The tolerance factor k is determined such that the tolerance intervals
    with confidence g cover at least the coverage fraction
    of the distribution N(mu,sigma^2), i.e.
    Prob[ Prob( Xmean - k * S < X < Xmean + k * S ) >= p ] = g,
    for X ~ N(mu,sig2) which is independent of Xmean and S.

    By default, this function uses an 'exact' method for computing the factor
    by Gauss-Kronod quadrature as described in the references [1,2,4]. There
    are also two approximate methods implemented: the 'howe' method as
    described in [5], and the 'guenther' method as described in [6]. A brief
    overview of both approximate methods can be found at NIST's website:
    https://www.itl.nist.gov/div898/handbook/prc/section2/prc263.htm

    Additional optional parameters are available to consider pooled variance
    studies when m random samples of size n are available. Furthermore,
    for the 'exact' method, optional parameters are available to
    consider simultaneous tolerance intervals as described in [7,8].
    If S is a pooled estimator of sig, based on m random samples of size n,
    normal_factor computes the tolerance factor k for the two-sided p-content
    and g-confidence tolerance intervals

        TI = [Xmean_i - k * S, Xmean_i + k * S], for i = 1,...,m

    where Xmean_i = mean(X_i), X_i = [X_i1,...,X_in] is a random sample of
    size n from the distribution N(mu_i,sig2) with unknown mean mu_i and
    variance sig2, and S = sqrt(S2), S2 is the pooled estimator of sig2,

        S2 = (1/nu) * sum_i=1:m ( sum_j=1:n (X_ij - Xmean_i)^2 )

    with nu degrees of freedom, nu = m * (n-1). For the 'exact' method, both
    the simultaneous and non-simultaneous cases can be considered.

    Parameters
    ----------
    n : scalar
        Sample size
    p : scalar in the interval [0.0, 1.0]
        Coverage (or content) probability,
        Prob( Xmean - k * S < X < Xmean + k * S ) >= p
    g : scalar in the interval [0.0, 1.0]
        Confidence probability,
        Prob[ Prob( Xmean-k*S < X < Xmean+k*S ) >= p ] = g.
    method : str
        Method to use for computing the factor. Available methods are 'exact',
        'howe', and 'guenther'. If None, the default method is 'exact'.
    m : scalar
        Number of independent random samples (of size n). If None,
        default value is m = 1.
    nu : scalar
        Degrees of freedom for distribution of the (pooled) sample
        variance S2. If None, default value is nu = m*(n-1).
    d2 : scalar
        Normalizing constant. For computing the factors of the
        non-simultaneous tolerance limits (xx'*betaHat +/- k * S)
        for the linear regression y = XX*beta +epsilon, set d2 =
        xx'*inv(XX'*XX)*xx.
        Typically, in simple linear regression the estimator S2 has
        nu = n-2 degrees of freedom. If None, default value is d2 = 1/n.
    simultaneous : boolean
        Logical flag for calculating the factor for
        simultaneous tolerance intervals. If False, normal_factor will
        calculate the factor for the non-simultaneous tolerance interval.
        Default value is False.
    tailprob : boolean
        Logical flag for representing the input probabilities
        'p' and 'g'. If True, the input parameters are
        represented as the tail coverage (i.e. 1 - p) and tail confidence
        (i.e. 1 - g). This option is useful if the interest is to
        calculate the tolerance factor for extremely large values
        of coverage and/or confidence, close to 1, as
        e.g. coverage = 1 - 1e-18. Default value is False.

    Returns
    -------
    float
        The calculated tolerance factor for the tolerance interval.

    References
    ----------
    [1] Krishnamoorthy K, Mathew T. (2009). Statistical Tolerance Regions:
        Theory, Applications, and Computation. John Wiley & Sons, Inc.,
        Hoboken, New Jersey. ISBN: 978-0-470-38026-0, 512 pages.

    [2] Witkovsky V. On the exact two-sided tolerance intervals for
        univariate normal distribution and linear regression. Austrian
        Journal of Statistics 43(4), 2014, 279-92.
        http://ajs.data-analysis.at/index.php/ajs/article/viewFile/vol43-4-6/35

    [3] ISO 16269-6:2014: Statistical interpretation of data - Part 6:
        Determination of statistical tolerance intervals.

    [4] Janiga I., Garaj I.: Two-sided tolerance limits of normal
        distributions with unknown means and unknown common variability.
        MEASUREMENT SCIENCE REVIEW, Volume 3, Section 1, 2003, 75-78.

    [5] Howe, W. G. “Two-Sided Tolerance Limits for Normal Populations,
        Some Improvements.” Journal of the American Statistical Association,
        vol. 64, no. 326, [American Statistical Association, Taylor & Francis,
        Ltd.], 1969, pp. 610–20, https://doi.org/10.2307/2283644.

    [6] Guenther, W. C. (1977). "Sampling Inspection in Statistical Quality
        Control",, Griffin's Statistical Monographs, Number 37, London.

    [7] Robert W. Mee (1990) Simultaneous Tolerance Intervals for Normal
        Populations With Common Variance, Technometrics, 32:1, 83-92,
        DOI: 10.1080/00401706.1990.10484595

    [8] K. Krishnamoorthy & Saptarshi Chakraberty (2022). Construction of
        simultaneous tolerance intervals for several normal distributions,
        Journal of Statistical Computation and Simulation, 92:1, 101-114,
        DOI: 10.1080/00949655.2021.1932885

    """
    # Handle default method:
    if method is None:
        method = 'exact'

    if method == 'exact':
        k = exact.tolerance_factor(n, p, g, m, nu, d2, simultaneous, tailprob)
    elif method == 'howe':
        k = approx.tolerance_factor_howe(n, p, g, m, nu)
    elif method == 'guenther':
        k = approx.tolerance_factor_guenther(n, p, g, m, nu)
    else:
        raise ValueError(
            "Invalid method requested. Valid methods are 'exact', 'howe', or "
            "'guenther'."
        )
    return k