Ash*_*r11 15 python deprecated pandas statsmodels
正如标题所暗示的那样,Pandas中ols命令中的滚动功能选项在statsmodels中迁移到哪里?我似乎找不到它.熊猫告诉我厄运正在起作用:
FutureWarning: The pandas.stats.ols module is deprecated and will be removed in a future version. We refer to external packages like statsmodels, see some examples here: http://statsmodels.sourceforge.net/stable/regression.html
model = pd.ols(y=series_1, x=mmmm, window=50)
Run Code Online (Sandbox Code Playgroud)
事实上,如果你做的事情如下:
import statsmodels.api as sm
model = sm.OLS(series_1, mmmm, window=50).fit()
print(model.summary())
Run Code Online (Sandbox Code Playgroud)
你得到结果(窗口不会影响代码的运行)但你只得到整个时期的回归运行参数,而不是应该应该处理的每个滚动周期的一系列参数.
Bra*_*mon 13
我创建了一个ols模块,旨在模仿大熊猫的弃用MovingOLS; 它就在这里.
它有三个核心类:
OLS:静态(单窗口)普通最小二乘回归.输出是NumPy数组RollingOLS:滚动(多窗口)普通最小二乘回归.输出是更高维度的NumPy数组.PandasRollingOLS:包装RollingOLSpandas Series&DataFrames 的结果.旨在模仿已弃用的pandas模块的外观.请注意,该模块是包的一部分(我目前正在上传到PyPi),它需要一个包间导入.
上面的前两个类完全在NumPy中实现,主要使用矩阵代数. RollingOLS广泛利用广播.属性很大程度上模仿了statsmodels的OLS RegressionResultsWrapper.
一个例子:
import urllib.parse
import pandas as pd
from pyfinance.ols import PandasRollingOLS
# You can also do this with pandas-datareader; here's the hard way
url = "https://fred.stlouisfed.org/graph/fredgraph.csv"
syms = {
"TWEXBMTH" : "usd",
"T10Y2YM" : "term_spread",
"GOLDAMGBD228NLBM" : "gold",
}
params = {
"fq": "Monthly,Monthly,Monthly",
"id": ",".join(syms.keys()),
"cosd": "2000-01-01",
"coed": "2019-02-01",
}
data = pd.read_csv(
url + "?" + urllib.parse.urlencode(params, safe=","),
na_values={"."},
parse_dates=["DATE"],
index_col=0
).pct_change().dropna().rename(columns=syms)
print(data.head())
# usd term_spread gold
# DATE
# 2000-02-01 0.012580 -1.409091 0.057152
# 2000-03-01 -0.000113 2.000000 -0.047034
# 2000-04-01 0.005634 0.518519 -0.023520
# 2000-05-01 0.022017 -0.097561 -0.016675
# 2000-06-01 -0.010116 0.027027 0.036599
y = data.usd
x = data.drop('usd', axis=1)
window = 12 # months
model = PandasRollingOLS(y=y, x=x, window=window)
print(model.beta.head()) # Coefficients excluding the intercept
# term_spread gold
# DATE
# 2001-01-01 0.000033 -0.054261
# 2001-02-01 0.000277 -0.188556
# 2001-03-01 0.002432 -0.294865
# 2001-04-01 0.002796 -0.334880
# 2001-05-01 0.002448 -0.241902
print(model.fstat.head())
# DATE
# 2001-01-01 0.136991
# 2001-02-01 1.233794
# 2001-03-01 3.053000
# 2001-04-01 3.997486
# 2001-05-01 3.855118
# Name: fstat, dtype: float64
print(model.rsq.head()) # R-squared
# DATE
# 2001-01-01 0.029543
# 2001-02-01 0.215179
# 2001-03-01 0.404210
# 2001-04-01 0.470432
# 2001-05-01 0.461408
# Name: rsq, dtype: float64
Run Code Online (Sandbox Code Playgroud)
用sklearn滚动测试版
import pandas as pd
from sklearn import linear_model
def rolling_beta(X, y, idx, window=255):
assert len(X)==len(y)
out_dates = []
out_beta = []
model_ols = linear_model.LinearRegression()
for iStart in range(0, len(X)-window):
iEnd = iStart+window
model_ols.fit(X[iStart:iEnd], y[iStart:iEnd])
#store output
out_dates.append(idx[iEnd])
out_beta.append(model_ols.coef_[0][0])
return pd.DataFrame({'beta':out_beta}, index=out_dates)
df_beta = rolling_beta(df_rtn_stocks['NDX'].values.reshape(-1, 1), df_rtn_stocks['CRM'].values.reshape(-1, 1), df_rtn_stocks.index.values, 255)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
11364 次 |
| 最近记录: |