我在 Python 中使用 statsmodels 进行逻辑回归分析。例如:
import statsmodels.api as sm
import numpy as np
x = arange(0,1,0.01)
y = np.random.rand(100)
y[y<=x] = 1
y[y!=1] = 0
x = sm.add_constant(x)
lr = sm.Logit(y,x)
result = lr.fit().summary()
Run Code Online (Sandbox Code Playgroud)
但我想为我的观察定义不同的权重。我组合了 4 个不同大小的数据集,并希望对分析进行加权,以便来自最大数据集的观察结果不会主导模型。
我正在尝试获取 GLM 中每个协变量的 F 统计量和 p 值。在 Python 中,我使用 stats mode.formula.api 来执行 GLM。
formula = 'PropNo_Pred ~ Geography + log10BMI + Cat_OpCavity + CatLes_neles + CatRural_urban + \
CatPred_Control + CatNative_Intro + Midpoint_of_study'
mod1 = smf.glm(formula=formula, data=A2, family=sm.families.Binomial()).fit()
mod1.summary()
Run Code Online (Sandbox Code Playgroud)
之后,我尝试使用 statsmodels.stats 中的方差分析对该模型进行方差分析测试
table1 = anova_lm(mod3)
print table1
Run Code Online (Sandbox Code Playgroud)
但是我收到一条错误消息:“GLMResults”对象没有属性“ssr”
看起来这个 anova_lm 函数只适用于线性模型,python 中有一个模块可以对 GLM 进行 anova 测试吗?
我正在尝试对 Pandas Dataframes 运行面板回归:
目前我有两个数据框,每个数据框包含 52 行(日期)* 99 列(99 个股票):带有数据表示的 Markdown 文件
运行时:
est=sm.OLS(Stockslist,averages).fit()
est.summary()
Run Code Online (Sandbox Code Playgroud)
我得到 ValueError: 形状 (52,99) 和 (52,99) 未对齐:99 (dim 1) != 52 (dim 0)
有人可以指出我做错了什么吗?该模型只是 y(i,t)=x(i,t)+误差项,因此没有截距。但是我想在未来添加时间效果。
亲切的问候,杰伦
我查看了统计模型的示例,但没有看到很多将交叉验证应用于时间序列的示例。
假设我有这样的东西
`In [1]: from __future__ import print_function
In [2]: import numpy as np
In [3]: import statsmodels.api as sm
import pandas as pd
from statsmodels.tsa.arima_process import arma_generate_sample
np.random.seed(12345)
In [4]: import pandas as pd
In [5]: from statsmodels.tsa.arima_process import arma_generate_sample
In [6]: np.random.seed(12345)
In [7]: arparams = np.array([.75, -.25])
In [8]: maparams = np.array([.65, .35])
In [9]:
In [9]: arparams = np.r_[1, -arparams]
In [10]: maparam = np.r_[1, maparams]
In [11]: nobs = 250
In [12]: y = arma_generate_sample(arparams, …Run Code Online (Sandbox Code Playgroud) 我发现statsmodels线性模型的 anova 测试的实现非常有用(http://www.statsmodels.org/dev/generated/statsmodels.stats.anova.anova_lm.html#statsmodels.stats.anova.anova_lm)但我想知道,因为它不存在于库中,如何为逻辑回归部分构建等效版本。
公式:
from statsmodels.formula.api import ols, logit
import statsmodels.api as sm
ols(formula_str, data=data_on_which_to_perform_analysis).fit()
logit(formula_str, data=data_on_which_to_perform_analysis).fit()
sm.stats.anova_lm()
Run Code Online (Sandbox Code Playgroud)
这意味着本质上(通过查看源代码)复制anova_single. 有没有人已经在某个远程存储库中做过这件事?我问是因为实现速度非常快,而且非常深入statsmodels核心库,所以解决它并不容易(至少以我目前的技能水平)
关于如何进行的任何建议?
我正在尝试计算Numpy 中的最小二乘问题(即带有简单回归的普通最小二乘法 (OLS)),以便找到相应的 R\xc2\xb2 值。然而,在某些情况下,Numpy 返回残差的空列表。以下面的超定示例(即方程多于未知数)来说明此问题:
\n\n\n\n(注:没有常数因子(即截距)(即全 1 的初始列向量),因此将使用无中心平方和 (TSS)。)
\n\nimport numpy as np\n\nA = np.array([[6, 6, 3], [40, 40, 20]]).T\ny = np.array([0.5, 0.2, 0.6])\n\nmodel_parameters, residuals, rank, singular_values = np.linalg.lstsq(A, y, rcond=None)\n\n# No Intercept, therefore use Uncentered Total Sum of Squares (TSS)\nuncentered_tss = np.sum((y)**2) \nnumpy_r2 = 1.0 - residuals / uncentered_tss\n\nprint("Numpy Model Parameter(s): " + str(model_parameters))\nprint("Numpy Sum of Squared Residuals (SSR): " + str(residuals))\nprint("Numpy R\xc2\xb2: " …Run Code Online (Sandbox Code Playgroud) 我有一个长度为 177 的数据框,我想计算和绘制部分自相关函数 (PACF)。
我导入了数据等,我这样做:
from statsmodels.tsa.stattools import pacf
ys = pacf(data[key][array].diff(1).dropna(), alpha=0.05, nlags=176, method="ywunbiased")
xs = range(lags+1)
plt.figure()
plt.scatter(xs,ys[0])
plt.grid()
plt.vlines(xs, 0, ys[0])
plt.plot(ys[1])
Run Code Online (Sandbox Code Playgroud)
使用的方法在很长的滞后(90ish)中导致数字大于 1,这是不正确的,我得到一个 RuntimeWarning: invalid value seen in sqrtreturn rho, np.sqrt(sigmasq) 但因为我看不到他们的源代码我不不知道这是什么意思。
老实说,当我搜索 PACF 时,所有示例只执行了 40 或 60 左右的 PACF,并且在延迟 = 2 之后它们从未有任何显着的 PACF,因此我也无法与其他示例进行比较。
但是当我使用:
method="ols"
# or
method="ywmle"
Run Code Online (Sandbox Code Playgroud)
数字已更正。所以这一定是他们用来解决它的算法。
我尝试导入 inspect 和 getsource 方法,但它没用,它只是表明它使用了另一个包,我找不到。
如果您也知道问题出在哪里,我将非常感谢您的帮助。
供您参考, data[key][array] 的值为:
[1131.130005,1144.939941,1126.209961,1107.300049,1120.680054,1140.839966,1101.719971,1104.23999,1114.579956,1130.199951,1173.819946,1211.920044,1181.27002,1203.599976,1180.589966,1156.849976,1191.5,1191.329956,1234.180054,1220.329956,1228.810059,1207.01001,1249.47998,1248.290039,1280.079956 ,1280.660034,1294.869995,1310.609985,1270.089966,1270.199951,1276.660034,1303.819946,1335.849976,1377.939941,1400.630005,1418.300049,1438.23999,1406.819946,1420.859985,1482.369995,1530.619995,1503.349976,1455.27002,1473.98999,1526.75,1549.380005,1481.140015,1468.359985,1378.550049,1330.630005 ,1322.699951,1385.589966,1400.380005,1280.0,1267.380005,1282.829956,1166.359985,968.75,896.23999,903.25,825.880005,735.090027,797.869995,872.8099980000001,919.1400150000001,919.320007,987.4799800000001,1020。6199949999999,1057.079956,1036.189941,1095.630005,1115.099976,1073.869995,1104.48999,1169.430054,1186.689941,1089.410034,1030.709961,1101.599976,1049.329956,1141.199951,1183.26001,1180.550049,1257.640015,1286.119995,1327.219971,1325.829956,1363.609985,1345.199951,1320.640015,1292.280029,1218.890015, 1131.420044,1253.300049,1246.959961,1257.599976,1312.410034,1365.680054,1408.469971,1397.910034,1310.329956,1362.160034,1379.319946,1406.579956,1440.670044,1412.160034,1416.180054,1426.189941,1498.109985,1514.680054,1569.189941,1597.569946,1630.73999,1606.280029,1685.72998,1632.969971,1681.550049, 1756.540039,1805.810059,1848.359985,1782.589966,1859.449951,1872.339966,1883.949951,1923.569946,1960.22998,1930.6700440000002,2003.369995,1972.290039,2018.050049,2067.560059,2058.899902,1994。9899899999998,2104.5,2067.889893,2085.51001,2107.389893,2063.110107,2103.840088,1972.180054,1920.030029,2079.360107,2080.409912,2043.939941,1940.2399899999998,1932.22998,2059.73999,2065.300049,2096.949951,2098.860107,2173.600098,2170.949951,2168.27002,2126.149902,2198.810059,2238.830078,2278.8701170000004, 2363.639893,2362.719971,2384.199951,2411.800049,2423.409912,2470.300049,2471.649902,2519.360107,2575.26001,2584.840088,2673.610107,2823.810059,2713.830078,2640.8701170000004,2648.050049,2705.27002,2718.3701170000004,2816.290039,2901.52002,2913.97998]2168.27002,2126.149902,2198.810059,2238.830078,2278.8701170000004,2363.639893,2362.719971,2384.199951,2411.800049,2423.409912,2470.300049,2471.649902,2519.360107,2575.26001,2584.840088,2673.610107,2823.810059,2713.830078,2640.8701170000004,2648.050049,2705.27002,2718.3701170000004,2816.290039,2901.52002,2913.97998]2168.27002,2126.149902,2198.810059,2238.830078,2278.8701170000004,2363.639893,2362.719971,2384.199951,2411.800049,2423.409912,2470.300049,2471.649902,2519.360107,2575.26001,2584.840088,2673.610107,2823.810059,2713.830078,2640.8701170000004,2648.050049,2705.27002,2718.3701170000004,2816.290039,2901.52002,2913.97998]
我正在使用 python statsmodels 包使用我的数据来训练 GLM 模型(泊松族)。我拥有的数据包含数值和分类值。我对数值进行了标准化,并对分类值进行了单热编码(放弃第一级)。当我将数据拟合到模型中时,出现以下异常:
~/miniconda3/envs/losscost/lib/python3.7/site-packages/insite/losscost/losscost.py in evaluate(self, x, control, peril_descs)
271 family=sm.families.Poisson(link=sm.families.links.log()),
272 )
--> 273 freq_fitted = freq_glm.fit()
274 freq_results[name].append(freq_fitted)
275
~/miniconda3/envs/losscost/lib/python3.7/site-packages/statsmodels/genmod/generalized_linear_model.py in fit(self, start_params, maxiter, method, tol, scale, cov_type, cov_kwds, use_t, full_output, disp, max_start_irls, **kwargs)
1025 return self._fit_irls(start_params=start_params, maxiter=maxiter,
1026 tol=tol, scale=scale, cov_type=cov_type,
-> 1027 cov_kwds=cov_kwds, use_t=use_t, **kwargs)
1028 else:
1029 self._optim_hessian = kwargs.get('optim_hessian')
~/miniconda3/envs/losscost/lib/python3.7/site-packages/statsmodels/genmod/generalized_linear_model.py in _fit_irls(self, start_params, maxiter, tol, scale, cov_type, cov_kwds, use_t, **kwargs)
1163 wls_mod = reg_tools._MinimalWLS(wlsendog, wlsexog,
1164 self.weights, check_endog=True,
-> 1165 check_weights=True) …Run Code Online (Sandbox Code Playgroud) 如何生成“较低”和“较高”预测,而不仅仅是“yhat”?
import statsmodels
from statsmodels.tsa.arima.model import ARIMA
assert statsmodels.__version__ == '0.12.0'
arima = ARIMA(df['value'], order=order)
model = arima.fit()
Run Code Online (Sandbox Code Playgroud)
现在我可以生成“yhat”预测
yhat = model.forecast(123)
Run Code Online (Sandbox Code Playgroud)
并获取模型参数的置信区间(但不适用于预测):
model.conf_int()
Run Code Online (Sandbox Code Playgroud)
但如何生成yhat_lower和yhat_upper预测呢?
我将以下面板存储在df:
| 状态 | 区 | 年 | y | 持续的 | x1 | x2 | 时间 | |
|---|---|---|---|---|---|---|---|---|
| 0 | 01 | 01001 | 2009年 | 12 | 1 | 0.956007 | 639673 | 1 |
| 1 | 01 | 01001 | 2010年 | 20 | 1 | 0.972175 | 639673 | 2 |
| 2 | 01 | 01001 | 2011年 | 22 | 1 | 0.988343 | 639673 | 3 |
| 3 | 01 | 01002 | 2009年 | 0 | 1 | 0 | 33746 | 1 |
| 4 | 01 | 01002 | 2010年 | 1 | 1 | 0.225071 | 33746 | 2 |
| 5 | 01 | 01002 | 2011年 | 5 | 1 | 0.450142 | 33746 | 3 |
| 6 | 01 | 01003 | 2009年 | 0 | 1 | 0 | 45196 | 1 |
| 7 | 01 | 01003 | 2010年 | 5 | 1 … |
statsmodels ×10
python ×8
statistics ×3
time-series ×2
anova ×1
arima ×1
empty-list ×1
glm ×1
linearmodels ×1
numpy ×1
pandas ×1
python-3.x ×1
regression ×1
sample ×1