Cle*_*leb 4 python statistics var pandas statsmodels
我遵循有关VAR模型的statsmodel教程,并对获得的结果有疑问(我的整个代码都可以在本文结尾处找到)。
原始数据(存储在中mdata)显然是不稳定的,因此需要进行转换,方法是使用以下行:
data = np.log(mdata).diff().dropna()
Run Code Online (Sandbox Code Playgroud)
如果然后绘制原始数据(mdata)和转换后的数据(data),则该图如下所示:
然后使用拟合对数差异数据
model = VAR(data)
results = model.fit(2)
Run Code Online (Sandbox Code Playgroud)
然后,如果将原始的对数差数据与拟合值作图,则会得到如下图:
我的问题是我如何才能得到相同的图,但对于没有对数差异的原始数据。如何将由拟合值确定的参数应用于这些原始数据?有没有一种方法可以使用我获得的参数将拟合的对数差异数据转换回原始数据,如果可以,该如何完成?
这是我的完整代码和获得的输出:
import pandas
import statsmodels as sm
from statsmodels.tsa.api import VAR
from statsmodels.tsa.base.datetools import dates_from_str
from statsmodels.tsa.stattools import adfuller
import numpy as np
import matplotlib.pyplot as plt
mdata = sm.datasets.macrodata.load_pandas().data
dates = mdata[['year', 'quarter']].astype(int).astype(str)
quarterly = dates["year"] + "Q" + dates["quarter"]
quarterly = dates_from_str(quarterly)
mdata = mdata[['realgdp', 'realcons', 'realinv']]
mdata.index = pandas.DatetimeIndex(quarterly)
data = np.log(mdata).diff().dropna()
f, ((ax1, ax2, ax3), (ax4, ax5, ax6)) = plt.subplots(2, 3, sharex='col', sharey='row')
ax1.plot(mdata.index, mdata['realgdp'])
ax2.plot(mdata.index, mdata['realcons'])
ax3.plot(mdata.index, mdata['realinv'])
ax4.plot(data.index, data['realgdp'])
ax5.plot(data.index, data['realcons'])
ax6.plot(data.index, data['realinv'])
f.suptitle('Original data vs. log-differenced data ')
plt.show()
print adfuller(mdata['realgdp'])
print adfuller(data['realgdp'])
# make a VAR model
model = VAR(data)
results = model.fit(2)
print results.summary()
# results.plot()
# plt.show()
f, axarr = plt.subplots(3, sharex=True)
axarr[0].plot(data.index, data['realgdp'])
axarr[0].plot(results.fittedvalues.index, results.fittedvalues['realgdp'])
axarr[1].plot(data.index, data['realcons'])
axarr[1].plot(results.fittedvalues.index, results.fittedvalues['realcons'])
axarr[2].plot(data.index, data['realinv'])
axarr[2].plot(results.fittedvalues.index, results.fittedvalues['realinv'])
f.suptitle('Original data vs. fitted data ')
plt.show()
Run Code Online (Sandbox Code Playgroud)
提供以下输出:
(1.7504627967647102, 0.99824553723350318, 12, 190, {'5%': -2.8768752281673717, '1%': -3.4652439354133255, '10%': -2.5749446537396121}, 2034.5171236683821)
(-6.9728713472162127, 8.5750958448994759e-10, 1, 200, {'5%': -2.876102355, '1%': -3.4634760791249999, '10%': -2.574532225}, -1261.4401395993809)
Summary of Regression Results
==================================
Model: VAR
Method: OLS
Date: Wed, 09, Mar, 2016
Time: 15:08:07
--------------------------------------------------------------------
No. of Equations: 3.00000 BIC: -27.5830
Nobs: 200.000 HQIC: -27.7892
Log likelihood: 1962.57 FPE: 7.42129e-13
AIC: -27.9293 Det(Omega_mle): 6.69358e-13
--------------------------------------------------------------------
Results for equation realgdp
==============================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------------
const 0.001527 0.001119 1.365 0.174
L1.realgdp -0.279435 0.169663 -1.647 0.101
L1.realcons 0.675016 0.131285 5.142 0.000
L1.realinv 0.033219 0.026194 1.268 0.206
L2.realgdp 0.008221 0.173522 0.047 0.962
L2.realcons 0.290458 0.145904 1.991 0.048
L2.realinv -0.007321 0.025786 -0.284 0.777
==============================================================================
Results for equation realcons
==============================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------------
const 0.005460 0.000969 5.634 0.000
L1.realgdp -0.100468 0.146924 -0.684 0.495
L1.realcons 0.268640 0.113690 2.363 0.019
L1.realinv 0.025739 0.022683 1.135 0.258
L2.realgdp -0.123174 0.150267 -0.820 0.413
L2.realcons 0.232499 0.126350 1.840 0.067
L2.realinv 0.023504 0.022330 1.053 0.294
==============================================================================
Results for equation realinv
==============================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------------
const -0.023903 0.005863 -4.077 0.000
L1.realgdp -1.970974 0.888892 -2.217 0.028
L1.realcons 4.414162 0.687825 6.418 0.000
L1.realinv 0.225479 0.137234 1.643 0.102
L2.realgdp 0.380786 0.909114 0.419 0.676
L2.realcons 0.800281 0.764416 1.047 0.296
L2.realinv -0.124079 0.135098 -0.918 0.360
==============================================================================
Correlation matrix of residuals
realgdp realcons realinv
realgdp 1.000000 0.603316 0.750722
realcons 0.603316 1.000000 0.131951
realinv 0.750722 0.131951 1.000000
Run Code Online (Sandbox Code Playgroud)
您正在寻找与np.exp的逆np.log。
因此,np.exp以您realgdp的应用为例:
axarr[0].plot(results.fittedvalues.index, np.exp(results.fittedvalues['realgdp']))
Run Code Online (Sandbox Code Playgroud)
将fittedvalues恢复到原始规模。
但是您可能希望fittedvalues与原始图在同一图上绘制mdata。为此,您将需要额外的步骤(与获得的结果相反data)。
如果您查看索引:
print mdata.index[:5]
print results.fittedvalues.index[:5]
DatetimeIndex(['1959-03-31', '1959-06-30', '1959-09-30',
'1959-12-31', '1960-03-31'],
dtype='datetime64[ns]', freq=None)
DatetimeIndex(['1959-12-31', '1960-03-31', '1960-06-30',
'1960-09-30', '1960-12-31'],
dtype='datetime64[ns]', freq=None)
Run Code Online (Sandbox Code Playgroud)
您会注意到,fittedvalues以开头'1959-12-31',因此要重构您,fittedvalues您需要:
log值从mdata索引'1959-12-31'(在之前'1959-09-30')(是)附加到的开头fittedvaluescumsum()此数组(是的倒数.diff())np.exp结果值。以realgdp为例,你可以从原始值一起绘制它mdata是这样的:
f, ax = plt.subplots()
ax.plot(mdata.index, mdata['realgdp'], label='Original Data')
ax.plot(mdata.index[2:],
np.exp(np.r_[np.log(mdata['realgdp'].iloc[2]),
results.fittedvalues['realgdp']].cumsum()),
label='Fitted Data')
ax.set_title('Original data vs. UN-log-differenced data')
ax.legend(loc=0)
Run Code Online (Sandbox Code Playgroud)
请注意,您需要使用 results = model.fit(2)
这将产生以下图: