使用statsmodels进行时间序列分析

sku*_*erk 6 python statsmodels

我正在尝试使用时间序列数据进行多元回归,但是当我将时间序列列添加到模型时,它最终将每个唯一值视为一个单独的变量,就像这样(我的'date'列的类型为datetime) :

est = smf.ols(formula='r ~ spend + date', data=df).fit()
print est.summary()

coef    std err t   P>|t|   [95.0% Conf. Int.]
Intercept   -6.249e-10  inf -0  nan nan nan
date[T.Timestamp('2014-10-08 00:00:00')]    -2.571e-10  inf -0  nan nan nan
date[T.Timestamp('2014-10-15 00:00:00')]    9.441e-11   inf 0   nan nan nan
date[T.Timestamp('2014-10-22 00:00:00')]    5.619e-11   inf 0   nan nan nan
date[T.Timestamp('2014-10-29 00:00:00')]    -8.035e-12  inf -0  nan nan nan
date[T.Timestamp('2014-11-05 00:00:00')]    6.334e-11   inf 0   nan nan nan
date[T.Timestamp('2014-11-12 00:00:00')]    7.9e+04 inf 0   nan nan nan
date[T.Timestamp('2014-11-19 00:00:00')]    1.58e+05    inf 0   nan nan nan
date[T.Timestamp('2014-11-26 00:00:00')]    1.58e+05    inf 0   nan nan nan
date[T.Timestamp('2014-12-03 00:00:00')]    1.58e+05    inf 0   nan nan nan
date[T.Timestamp('2014-12-10 00:00:00')]    2.28e+05    inf 0   nan nan nan
date[T.Timestamp('2014-12-17 00:00:00')]    3.28e+05    inf 0   nan nan nan
date[T.Timestamp('2014-12-24 00:00:00')]    3.705e+05   inf 0   nan nan nan
spend   2.105e-10   inf 0   nan nan nan
Run Code Online (Sandbox Code Playgroud)

我也试过statsmodel的tms包,但不知道如何处理'频率':

ar_model = sm.tsa.AR(df, freq='1')

ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).
Run Code Online (Sandbox Code Playgroud)

cas*_*t42 0

您可以为每个日期拟合一个线性模型,因为 ols 将日期视为分类变量。我建议你尝试:

est = smf.ols(formula='r ~ spend', data=df).fit()
print est.summary()
Run Code Online (Sandbox Code Playgroud)

对于 statsmodel 尝试:

ar_model = sm.tsa.AR(df['spend'], freq='1')
Run Code Online (Sandbox Code Playgroud)