statsmodels.tsa.stattools 中的 PACF 函数在使用 ywunbiased 时给出的数字大于 1?

She*_*Rad 7 statistics time-series python-3.x statsmodels

我有一个长度为 177 的数据框,我想计算和绘制部分自相关函数 (PACF)。

我导入了数据等,我这样做:

from statsmodels.tsa.stattools import pacf
ys = pacf(data[key][array].diff(1).dropna(), alpha=0.05, nlags=176, method="ywunbiased")
xs = range(lags+1)
plt.figure()
plt.scatter(xs,ys[0])
plt.grid()
plt.vlines(xs, 0, ys[0])
plt.plot(ys[1])
Run Code Online (Sandbox Code Playgroud)

使用的方法在很长的滞后(90ish)中导致数字大于 1,这是不正确的,我得到一个 RuntimeWarning: invalid value seen in sqrtreturn rho, np.sqrt(sigmasq) 但因为我看不到他们的源代码我不不知道这是什么意思。

老实说,当我搜索 PACF 时,所有示例只执行了 40 或 60 左右的 PACF,并且在延迟 = 2 之后它们从未有任何显着的 PACF,因此我也无法与其他示例进行比较。

但是当我使用:

method="ols"
# or
method="ywmle"
Run Code Online (Sandbox Code Playgroud)

数字已更正。所以这一定是他们用来解决它的算法。

我尝试导入 inspect 和 getsource 方法,但它没用,它只是表明它使用了另一个包,我找不到。

如果您也知道问题出在哪里,我将非常感谢您的帮助。

供您参考, data[key][array] 的值为:

[1131.130005,1144.939941,1126.209961,1107.300049,1120.680054,1140.839966,1101.719971,1104.23999,1114.579956,1130.199951,1173.819946,1211.920044,1181.27002,1203.599976,1180.589966,1156.849976,1191.5,1191.329956,1234.180054,1220.329956,1228.810059,1207.01001,1249.47998,1248.290039,1280.079956 ,1280.660034,1294.869995,1310.609985,1270.089966,1270.199951,1276.660034,1303.819946,1335.849976,1377.939941,1400.630005,1418.300049,1438.23999,1406.819946,1420.859985,1482.369995,1530.619995,1503.349976,1455.27002,1473.98999,1526.75,1549.380005,1481.140015,1468.359985,1378.550049,1330.630005 ,1322.699951,1385.589966,1400.380005,1280.0,1267.380005,1282.829956,1166.359985,968.75,896.23999,903.25,825.880005,735.090027,797.869995,872.8099980000001,919.1400150000001,919.320007,987.4799800000001,1020。6199949999999,1057.079956,1036.189941,1095.630005,1115.099976,1073.869995,1104.48999,1169.430054,1186.689941,1089.410034,1030.709961,1101.599976,1049.329956,1141.199951,1183.26001,1180.550049,1257.640015,1286.119995,1327.219971,1325.829956,1363.609985,1345.199951,1320.640015,1292.280029,1218.890015, 1131.420044,1253.300049,1246.959961,1257.599976,1312.410034,1365.680054,1408.469971,1397.910034,1310.329956,1362.160034,1379.319946,1406.579956,1440.670044,1412.160034,1416.180054,1426.189941,1498.109985,1514.680054,1569.189941,1597.569946,1630.73999,1606.280029,1685.72998,1632.969971,1681.550049, 1756.540039,1805.810059,1848.359985,1782.589966,1859.449951,1872.339966,1883.949951,1923.569946,1960.22998,1930.6700440000002,2003.369995,1972.290039,2018.050049,2067.560059,2058.899902,1994。9899899999998,2104.5,2067.889893,2085.51001,2107.389893,2063.110107,2103.840088,1972.180054,1920.030029,2079.360107,2080.409912,2043.939941,1940.2399899999998,1932.22998,2059.73999,2065.300049,2096.949951,2098.860107,2173.600098,2170.949951,2168.27002,2126.149902,2198.810059,2238.830078,2278.8701170000004, 2363.639893,2362.719971,2384.199951,2411.800049,2423.409912,2470.300049,2471.649902,2519.360107,2575.26001,2584.840088,2673.610107,2823.810059,2713.830078,2640.8701170000004,2648.050049,2705.27002,2718.3701170000004,2816.290039,2901.52002,2913.97998]2168.27002,2126.149902,2198.810059,2238.830078,2278.8701170000004,2363.639893,2362.719971,2384.199951,2411.800049,2423.409912,2470.300049,2471.649902,2519.360107,2575.26001,2584.840088,2673.610107,2823.810059,2713.830078,2640.8701170000004,2648.050049,2705.27002,2718.3701170000004,2816.290039,2901.52002,2913.97998]2168.27002,2126.149902,2198.810059,2238.830078,2278.8701170000004,2363.639893,2362.719971,2384.199951,2411.800049,2423.409912,2470.300049,2471.649902,2519.360107,2575.26001,2584.840088,2673.610107,2823.810059,2713.830078,2640.8701170000004,2648.050049,2705.27002,2718.3701170000004,2816.290039,2901.52002,2913.97998]

cfu*_*ton 3

你的时间序列显然不是静止的,因此违反了圣诞行者假设。

更一般地说,PACF 通常适用于平稳时间序列。在考虑部分自相关之前,您可能首先对数据进行差异化。

  • 只是为了添加一个参考,Enders (2014) 建议,在样本大小为 T 的情况下,PACF 只能计算到滞后 T / 4。由于您有 176 个数据点,这条经验法则建议不要考虑滞后大于的 PACF 44. (5认同)
  • 你说得对。我认为问题可能在于,用于计算这些滞后那么长的值的数据点相对较少,因此估计值在数值上不太稳定。其他方法可能不太容易出现数值问题,或者它们只是不会在这里遇到麻烦。无论如何,对于很长的滞后,我不会过分重视任何部分自相关值。 (2认同)