时间序列熊猫的线性回归

Question

时间序列熊猫的线性回归

我希望得到一个时间序列作为预测器的回归,我试图按照这个答案给出答案(OLS与熊猫:日期时间索引作为预测器)但它似乎不再适用于我的最佳状态知识.

我错过了什么或有新的方法吗？

import pandas as pd

rng = pd.date_range('1/1/2011', periods=4, freq='H')       
s = pd.Series(range(4), index = rng)                                                                      
z = s.reset_index()

pd.ols(x=z["index"], y=z[0])

Run Code Online (Sandbox Code Playgroud)

我收到了这个错误.错误是解释性的,但我想知道在重新实现以前有效的解决方案时我缺少什么.

TypeError:不能将[datetime64 [ns]]到[float64]的日期时间类型化

Answer 1

Joh*_*hnE 3

我不确定为什么pd.ols那里如此挑剔（在我看来，您确实正确地遵循了该示例）。我怀疑这是由于 pandas 处理或存储日期时间索引的方式发生了变化，但我懒得进一步探索这一点。无论如何，由于您的日期时间变量仅在小时方面有所不同，因此您可以使用访问器提取小时dt：

pd.ols(x=pd.to_datetime(z["index"]).dt.hour, y=z[0])

Run Code Online (Sandbox Code Playgroud)

但是，这会导致 r 平方为 1，因为您的模型过度指定了截距（并且 y 是 x 的线性函数）。您可以将更改range为np.random.randn，然后您会得到看起来像正常回归结果的结果。

In [6]: z = pd.Series(np.random.randn(4), index = rng).reset_index()                                                               
        pd.ols(x=pd.to_datetime(z["index"]).dt.hour, y=z[0])
Out[6]: 

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         4
Number of Degrees of Freedom:   2

R-squared:         0.7743
Adj R-squared:     0.6615

Rmse:              0.5156

F-stat (1, 2):     6.8626, p-value:     0.1200

Degrees of Freedom: model 1, resid 2

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x    -0.6040     0.2306      -2.62     0.1200    -1.0560    -0.1521
     intercept     0.2915     0.4314       0.68     0.5689    -0.5540     1.1370
---------------------------------End of Summary---------------------------------

Run Code Online (Sandbox Code Playgroud)

或者，您可以将索引转换为整数，尽管我发现这效果不是很好（我假设因为整数代表自纪元或类似的纳秒，因此非常大并导致精度问题），但转换为整数并除以一万亿左右确实有效，并且给出了与使用基本相同的结果dt.hour（即相同的 r 平方）：

pd.ols(x=pd.to_datetime(z["index"]).astype(int)/1e12, y=z[0])

Run Code Online (Sandbox Code Playgroud)

错误消息的来源

FWIW，看起来该错误消息来自以下内容：

pd.to_datetime(z["index"]).astype(float)

Run Code Online (Sandbox Code Playgroud)

尽管一个相当明显的解决方法是这样的：

pd.to_datetime(z["index"]).astype(int).astype(float)

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，11 月前
查看次数：	6069 次
最近记录：	10 年，11 月前