在R中,当一些系数由于奇点而被丢弃时,如何使用vcovHC()计算稳健的标准误差?标准的lm函数似乎可以很好地计算实际估计的所有系数的正常标准误差,但vcovHC()会抛出一个错误:"面包错误.%*%肉.:不一致的参数".
(我使用的实际数据有点复杂.事实上,它是一个使用两种不同固定效果的模型,我遇到局部奇点,我不能简单地摆脱它.至少我不知道如何.对于两个固定效应我使用第一个因子有150个级别,第二个有142个级别,总共有九个奇点,这是因为数据是在十个块中收集的.)
这是我的输出:
Call:
lm(formula = one ~ two + three + Jan + Feb + Mar + Apr + May +
Jun + Jul + Aug + Sep + Oct + Nov + Dec, data = dat)
Residuals:
Min 1Q Median 3Q Max
-130.12 -60.95 0.08 61.05 137.35
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1169.74313 57.36807 20.390 <2e-16 ***
two -0.07963 0.06720 -1.185 0.237
three -0.04053 0.06686 -0.606 0.545
Jan 8.10336 22.05552 0.367 0.714
Feb 0.44025 22.11275 0.020 0.984
Mar 19.65066 22.02454 0.892 0.373
Apr -13.19779 22.02886 -0.599 0.550
May 15.39534 22.10445 0.696 0.487
Jun -12.50227 22.07013 -0.566 0.572
Jul -20.58648 22.06772 -0.933 0.352
Aug -0.72223 22.36923 -0.032 0.974
Sep 12.42204 22.09296 0.562 0.574
Oct 25.14836 22.04324 1.141 0.255
Nov 18.13337 22.08717 0.821 0.413
Dec NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 69.63 on 226 degrees of freedom
Multiple R-squared: 0.04878, Adjusted R-squared: -0.005939
F-statistic: 0.8914 on 13 and 226 DF, p-value: 0.5629
> model$se <- vcovHC(model)
Error in bread. %*% meat. : non-conformable arguments
Run Code Online (Sandbox Code Playgroud)
这是一个剪切的最小代码,用于重现错误.
library(sandwich)
set.seed(101)
dat<-data.frame(one=c(sample(1000:1239)),
two=c(sample(200:439)),
three=c(sample(600:839)),
Jan=c(rep(1,20),rep(0,220)),
Feb=c(rep(0,20),rep(1,20),rep(0,200)),
Mar=c(rep(0,40),rep(1,20),rep(0,180)),
Apr=c(rep(0,60),rep(1,20),rep(0,160)),
May=c(rep(0,80),rep(1,20),rep(0,140)),
Jun=c(rep(0,100),rep(1,20),rep(0,120)),
Jul=c(rep(0,120),rep(1,20),rep(0,100)),
Aug=c(rep(0,140),rep(1,20),rep(0,80)),
Sep=c(rep(0,160),rep(1,20),rep(0,60)),
Oct=c(rep(0,180),rep(1,20),rep(0,40)),
Nov=c(rep(0,200),rep(1,20),rep(0,20)),
Dec=c(rep(0,220),rep(1,20)))
model <- lm(one ~ two + three + Jan + Feb + Mar + Apr + May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data=dat)
summary(model)
model$se <- vcovHC(model)
Run Code Online (Sandbox Code Playgroud)
具有奇点的模型永远都不好,应该修复它们。在你的例子中,你有 12 个月的 12 个系数,还有全局截距!因此,实际上只有 12 个实际参数需要估计,有 13 个系数。您真正想要的是禁用全局拦截 - 这样您将获得更像特定于月份的拦截:
\n\n> model <- lm(one ~ 0 + two + three + Jan + Feb + Mar + Apr + May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data=dat)\n> summary(model)\n\nCall:\nlm(formula = one ~ 0 + two + three + Jan + Feb + Mar + Apr + \n May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data = dat)\n\nResiduals:\n Min 1Q Median 3Q Max \n-133.817 -55.636 3.329 56.768 126.772 \n\nCoefficients:\n Estimate Std. Error t value Pr(>|t|) \ntwo -0.09670 0.06621 -1.460 0.146 \nthree 0.02446 0.06666 0.367 0.714 \nJan 1130.05812 52.79625 21.404 <2e-16 ***\nFeb 1121.32904 55.18864 20.318 <2e-16 ***\nMar 1143.50310 53.59603 21.336 <2e-16 ***\nApr 1143.95365 54.99724 20.800 <2e-16 ***\nMay 1136.36429 53.38218 21.287 <2e-16 ***\nJun 1129.86010 53.85865 20.978 <2e-16 ***\nJul 1105.10045 54.94940 20.111 <2e-16 ***\nAug 1147.47152 54.57201 21.027 <2e-16 ***\nSep 1139.42205 53.58611 21.263 <2e-16 ***\nOct 1117.75075 55.35703 20.192 <2e-16 ***\nNov 1129.20208 53.54934 21.087 <2e-16 ***\nDec 1149.55556 53.52499 21.477 <2e-16 ***\n---\nSignif. codes: 0 \xe2\x80\x98***\xe2\x80\x99 0.001 \xe2\x80\x98**\xe2\x80\x99 0.01 \xe2\x80\x98*\xe2\x80\x99 0.05 \xe2\x80\x98.\xe2\x80\x99 0.1 \xe2\x80\x98 \xe2\x80\x99 1\n\nResidual standard error: 69.81 on 226 degrees of freedom\nMultiple R-squared: 0.9964, Adjusted R-squared: 0.9961 \nF-statistic: 4409 on 14 and 226 DF, p-value: < 2.2e-16\nRun Code Online (Sandbox Code Playgroud)\n\n那么,它是一个正常模型,因此 vcovHC 不会有任何问题。
\n