R计算具有奇点的lm模型的鲁棒标准误差(vcovHC)

Chr*_*ris 7 regression r lm

在R中,当一些系数由于奇点而被丢弃时,如何使用vcovHC()计算稳健的标准误差?标准的lm函数似乎可以很好地计算实际估计的所有系数的正常标准误差,但vcovHC()会抛出一个错误:"面包错误.%*%肉.:不一致的参数".

(我使用的实际数据有点复杂.事实上,它是一个使用两种不同固定效果的模型,我遇到局部奇点,我不能简单地摆脱它.至少我不知道如何.对于两个固定效应我使用第一个因子有150个级别,第二个有142个级别,总共有九个奇点,这是因为数据是在十个块中收集的.)

这是我的输出:

Call:
lm(formula = one ~ two + three + Jan + Feb + Mar + Apr + May + 
Jun + Jul + Aug + Sep + Oct + Nov + Dec, data = dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-130.12  -60.95    0.08   61.05  137.35 

Coefficients: (1 not defined because of singularities)
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1169.74313   57.36807  20.390   <2e-16 ***
two           -0.07963    0.06720  -1.185    0.237    
three         -0.04053    0.06686  -0.606    0.545    
Jan            8.10336   22.05552   0.367    0.714    
Feb            0.44025   22.11275   0.020    0.984    
Mar           19.65066   22.02454   0.892    0.373    
Apr          -13.19779   22.02886  -0.599    0.550    
May           15.39534   22.10445   0.696    0.487    
Jun          -12.50227   22.07013  -0.566    0.572    
Jul          -20.58648   22.06772  -0.933    0.352    
Aug           -0.72223   22.36923  -0.032    0.974    
Sep           12.42204   22.09296   0.562    0.574    
Oct           25.14836   22.04324   1.141    0.255    
Nov           18.13337   22.08717   0.821    0.413    
Dec                 NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 69.63 on 226 degrees of freedom
Multiple R-squared: 0.04878,    Adjusted R-squared: -0.005939 
F-statistic: 0.8914 on 13 and 226 DF,  p-value: 0.5629 

> model$se <- vcovHC(model)
Error in bread. %*% meat. : non-conformable arguments
Run Code Online (Sandbox Code Playgroud)

这是一个剪切的最小代码,用于重现错误.

library(sandwich)
set.seed(101)
dat<-data.frame(one=c(sample(1000:1239)),
              two=c(sample(200:439)),
              three=c(sample(600:839)),
              Jan=c(rep(1,20),rep(0,220)),
              Feb=c(rep(0,20),rep(1,20),rep(0,200)),
              Mar=c(rep(0,40),rep(1,20),rep(0,180)),
              Apr=c(rep(0,60),rep(1,20),rep(0,160)),
              May=c(rep(0,80),rep(1,20),rep(0,140)),
              Jun=c(rep(0,100),rep(1,20),rep(0,120)),
              Jul=c(rep(0,120),rep(1,20),rep(0,100)),
              Aug=c(rep(0,140),rep(1,20),rep(0,80)),
              Sep=c(rep(0,160),rep(1,20),rep(0,60)),
              Oct=c(rep(0,180),rep(1,20),rep(0,40)),
              Nov=c(rep(0,200),rep(1,20),rep(0,20)),
              Dec=c(rep(0,220),rep(1,20))) 
model <- lm(one ~ two + three + Jan + Feb + Mar + Apr + May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data=dat)
summary(model)
model$se <- vcovHC(model)
Run Code Online (Sandbox Code Playgroud)

TMS*_*TMS 0

具有奇点的模型永远都不好,应该修复它们。在你的例子中,你有 12 个月的 12 个系数,还有全局截距!因此,实际上只有 12 个实际参数需要估计,有 13 个系数。您真正想要的是禁用全局拦截 - 这样您将获得更像特定于月份的拦截:

\n\n
> model <- lm(one ~ 0 + two + three + Jan + Feb + Mar + Apr + May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data=dat)\n> summary(model)\n\nCall:\nlm(formula = one ~ 0 + two + three + Jan + Feb + Mar + Apr + \n    May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data = dat)\n\nResiduals:\n     Min       1Q   Median       3Q      Max \n-133.817  -55.636    3.329   56.768  126.772 \n\nCoefficients:\n        Estimate Std. Error t value Pr(>|t|)    \ntwo     -0.09670    0.06621  -1.460    0.146    \nthree    0.02446    0.06666   0.367    0.714    \nJan   1130.05812   52.79625  21.404   <2e-16 ***\nFeb   1121.32904   55.18864  20.318   <2e-16 ***\nMar   1143.50310   53.59603  21.336   <2e-16 ***\nApr   1143.95365   54.99724  20.800   <2e-16 ***\nMay   1136.36429   53.38218  21.287   <2e-16 ***\nJun   1129.86010   53.85865  20.978   <2e-16 ***\nJul   1105.10045   54.94940  20.111   <2e-16 ***\nAug   1147.47152   54.57201  21.027   <2e-16 ***\nSep   1139.42205   53.58611  21.263   <2e-16 ***\nOct   1117.75075   55.35703  20.192   <2e-16 ***\nNov   1129.20208   53.54934  21.087   <2e-16 ***\nDec   1149.55556   53.52499  21.477   <2e-16 ***\n---\nSignif. codes:  0 \xe2\x80\x98***\xe2\x80\x99 0.001 \xe2\x80\x98**\xe2\x80\x99 0.01 \xe2\x80\x98*\xe2\x80\x99 0.05 \xe2\x80\x98.\xe2\x80\x99 0.1 \xe2\x80\x98 \xe2\x80\x99 1\n\nResidual standard error: 69.81 on 226 degrees of freedom\nMultiple R-squared:  0.9964,    Adjusted R-squared:  0.9961 \nF-statistic:  4409 on 14 and 226 DF,  p-value: < 2.2e-16\n
Run Code Online (Sandbox Code Playgroud)\n\n

那么,它是一个正常模型,因此 vcovHC 不会有任何问题。

\n