如何在Python中计算自协方差

Rdo*_*dou 10 python numpy scipy

我想计算3个阵列X1,X2和Y的自协方差,它们都是固定随机过程.sciPy或其他库中是否有任何功能可以解决这个问题?

blu*_*xel 6

根据离散信号的自协方差系数的标准估计,可用公式表示:

在此输入图像描述

... x(i)给定信号(即特定的1D向量)在哪里,k代表x(i)信号按k样本移位,Nx(i)信号的长度,以及:

在此输入图像描述

...这是简单的平均值,我们可以写:

'''
Calculate the autocovarriance coefficient.
'''

import numpy as np

Xi = np.array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5])
N = np.size(Xi)
k = 5
Xs = np.average(Xi)

def autocovariance(Xi, N, k, Xs):
    autoCov = 0
    for i in np.arange(0, N-k):
        autoCov += ((Xi[i+k])-Xs)*(Xi[i]-Xs)
    return (1/(N-1))*autoCov

print("Autocovariance:", autocovariance(Xi, N, k, Xs))
Run Code Online (Sandbox Code Playgroud)

如果你想归一化自协方差系数,它将成为自相关系数,表示为:

在此输入图像描述

...比你只需要在上面的代码中添加两行:

def autocorrelation():
    return autocovariance(Xi, N, k, Xs) / autocovariance(Xi, N, 0, Xs)
Run Code Online (Sandbox Code Playgroud)

这是完整的脚本:

'''
Calculate the autocovarriance and autocorrelation coefficients.
'''

import numpy as np

Xi = np.array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5])
N = np.size(Xi)
k = 5
Xs = np.average(Xi)

def autocovariance(Xi, N, k, Xs):
    autoCov = 0
    for i in np.arange(0, N-k):
        autoCov += ((Xi[i+k])-Xs)*(Xi[i]-Xs)
    return (1/(N-1))*autoCov

def autocorrelation():
    return autocovariance(Xi, N, k, Xs) / autocovariance(Xi, N, 0, Xs)

print("Autocovariance:", autocovariance(Xi, N, k, Xs))
print("Autocorrelation:", autocorrelation())
Run Code Online (Sandbox Code Playgroud)


RGW*_*ton 5

对之前的答案进行了一个小调整,避免了 pythonfor循环并使用 numpy 数组操作。如果您有大量数据,这会更快。

def lagged_auto_cov(Xi,t):
    """
    for series of values x_i, length N, compute empirical auto-cov with lag t
    defined: 1/(N-1) * \sum_{i=0}^{N-t} ( x_i - x_s ) * ( x_{i+t} - x_s )
    """
    N = len(Xi)

    # use sample mean estimate from whole series
    Xs = np.mean(Xi)

    # construct copies of series shifted relative to each other, 
    # with mean subtracted from values
    end_padded_series = np.zeros(N+t)
    end_padded_series[:N] = Xi - Xs
    start_padded_series = np.zeros(N+t)
    start_padded_series[t:] = Xi - Xs

    auto_cov = 1./(N-1) * np.sum( start_padded_series*end_padded_series )
    return auto_cov
Run Code Online (Sandbox Code Playgroud)

将此与@bluevoxel的代码进行比较,使用 50,000 个数据点的时间序列并计算单个固定滞后值的自相关,pythonfor循环代码平均约为 30 毫秒,使用 numpy 数组平均速度快于 0.3毫秒(在我的笔记本电脑上运行)。