多维置信区间

Rap*_*oth 19 python matplotlib scipy

我有许多元组(par1,par2),即通过多次重复实验获得的二维参数空间中的点.

我正在寻找计算和可视化置信椭圆的可能性(不确定这是否是正确的术语).这是我在网上找到的一个示例图,用于显示我的意思:

在此输入图像描述

来源:blogspot.ch/2011/07/classification-and-discrimination-with.html

所以原则上我必须将多元正态分布拟合到数据点的二维直方图.有人可以帮我这个吗?

Joe*_*ton 36

听起来你只是想要分散点的2-sigma椭圆?

如果是这样,请考虑这样的事情(从这里的一些代码:https://github.com/joferkington/oost_paper_code/blob/master/error_ellipse.py):

import numpy as np

import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse

def plot_point_cov(points, nstd=2, ax=None, **kwargs):
    """
    Plots an `nstd` sigma ellipse based on the mean and covariance of a point
    "cloud" (points, an Nx2 array).

    Parameters
    ----------
        points : An Nx2 array of the data points.
        nstd : The radius of the ellipse in numbers of standard deviations.
            Defaults to 2 standard deviations.
        ax : The axis that the ellipse will be plotted on. Defaults to the 
            current axis.
        Additional keyword arguments are pass on to the ellipse patch.

    Returns
    -------
        A matplotlib ellipse artist
    """
    pos = points.mean(axis=0)
    cov = np.cov(points, rowvar=False)
    return plot_cov_ellipse(cov, pos, nstd, ax, **kwargs)

def plot_cov_ellipse(cov, pos, nstd=2, ax=None, **kwargs):
    """
    Plots an `nstd` sigma error ellipse based on the specified covariance
    matrix (`cov`). Additional keyword arguments are passed on to the 
    ellipse patch artist.

    Parameters
    ----------
        cov : The 2x2 covariance matrix to base the ellipse on
        pos : The location of the center of the ellipse. Expects a 2-element
            sequence of [x0, y0].
        nstd : The radius of the ellipse in numbers of standard deviations.
            Defaults to 2 standard deviations.
        ax : The axis that the ellipse will be plotted on. Defaults to the 
            current axis.
        Additional keyword arguments are pass on to the ellipse patch.

    Returns
    -------
        A matplotlib ellipse artist
    """
    def eigsorted(cov):
        vals, vecs = np.linalg.eigh(cov)
        order = vals.argsort()[::-1]
        return vals[order], vecs[:,order]

    if ax is None:
        ax = plt.gca()

    vals, vecs = eigsorted(cov)
    theta = np.degrees(np.arctan2(*vecs[:,0][::-1]))

    # Width and height are "full" widths, not radius
    width, height = 2 * nstd * np.sqrt(vals)
    ellip = Ellipse(xy=pos, width=width, height=height, angle=theta, **kwargs)

    ax.add_artist(ellip)
    return ellip

if __name__ == '__main__':
    #-- Example usage -----------------------
    # Generate some random, correlated data
    points = np.random.multivariate_normal(
            mean=(1,1), cov=[[0.4, 9],[9, 10]], size=1000
            )
    # Plot the raw points...
    x, y = points.T
    plt.plot(x, y, 'ro')

    # Plot a transparent 3 standard deviation covariance ellipse
    plot_point_cov(points, nstd=3, alpha=0.5, color='green')

    plt.show()
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

  • @JoeKington我们不需要参考卡方概率分布表来找出我们的“nstd”,即是68%、90%还是95%? (2认同)
  • @ThePredator - 如果你用它作为测试,是的。(换句话说,在 p 置信水平下,这是与另一个分布不同/相同的分布吗?)如果您只是将其用作描述,则不会。不过,您是否能够根据拥有的样本数量正确估计标准差和平均值是一个完全独立的问题。 (2认同)
  • @ThePredator - `arctan2`返回全角(可以在4个象限中的任意一个).`arctan`将输出限制在象限1和4(-pi/2和pi/2之间).您可能会注意到`arctan`只需要一个参数.因此,它无法区分象限1和4中的角度以及象限2和3中的类似角度.这是许多其他编程语言共享的惯例,这在很大程度上是因为C以这种方式定义它们. (2认同)

Syr*_*jor 5

请参阅如何绘制协方差误差椭圆.

这是python的实现:

import numpy as np
from scipy.stats import norm, chi2

def cov_ellipse(cov, q=None, nsig=None, **kwargs):
    """
    Parameters
    ----------
    cov : (2, 2) array
        Covariance matrix.
    q : float, optional
        Confidence level, should be in (0, 1)
    nsig : int, optional
        Confidence level in unit of standard deviations. 
        E.g. 1 stands for 68.3% and 2 stands for 95.4%.

    Returns
    -------
    width, height, rotation :
         The lengths of two axises and the rotation angle in degree
    for the ellipse.
    """

    if q is not None:
        q = np.asarray(q)
    elif nsig is not None:
        q = 2 * norm.cdf(nsig) - 1
    else:
        raise ValueError('One of `q` and `nsig` should be specified.')
    r2 = chi2.ppf(q, 2)

    val, vec = np.linalg.eigh(cov)
    width, height = 2 * sqrt(val[:, None] * r2)
    rotation = np.degrees(arctan2(*vec[::-1, 0]))

    return width, height, rotation
Run Code Online (Sandbox Code Playgroud)

在Joe Kington的回答中,标准偏差的含义是错误的.通常我们使用1,2西格玛为68%,95%置信水平,但他的答案中的2西格玛椭圆不包含95%的总分布概率.正确的方法是使用卡方分布来估算椭圆大小,如帖子所示.

  • 他的答案中显示的椭圆不是 2-sigma 椭圆。它是一个 3-sigma 椭圆,并且它包含的点数量与您对 3-sigma 椭圆的期望数量差不多。 (2认同)
  • 我相信之所以会出现差异,是因为这个答案描述了一个[置信椭圆](https://en.wikipedia.org/wiki/Confidence_interval),而 Joe 的答案描述了一个 N-sigma 误差椭圆。这两者之间的区别在[此处](https://ftp.ngs.noaa.gov/PUBS_LIB/AlgorithmsForConfidenceCirclesAndEllipses_TR_NOS107_CGS3.pdf)进行了解释。 (2认同)