进行 Tobit 回归时的奇异性错误

Dan*_*Cho 5 regression r na

我正在尝试估计一个标准的 tobit 模型,该模型被审查为零。

变量是

因变量:幸福

自变量

  • 城市(芝加哥,纽约),
  • 性别(男,女),
  • 就业(0=失业,1=就业),
  • 工作类型(失业,蓝色,白色),
  • 假期(失业,每周1天,每周2天)

“Worktype”和“Holiday”变量与“Employment”变量相互作用。

我正在使用censReg包进行 tobit 回归。

censReg(Happiness ~ City + Gender + Employment:Worktype + Employment:Holiday)
Run Code Online (Sandbox Code Playgroud)

summary()返回以下错误。

Error in printCoefmat(coef(x, logSigma = logSigma), digits = digits) : 
  'x' must be coefficient matrix/data frame
Run Code Online (Sandbox Code Playgroud)

为了找出原因,我运行了 OLS 回归。

有一些 NA 值,我认为这是因为模型设计和变量设置(某些变量似乎有奇点。而'Employment' = 0有值的人'Worktype' = Unemployed'Holidays' = Unemployed。这可能是原因?)

lm(Happiness ~ City + Gender + Employment:Worktype + Employment:Holiday)


Coefficients: (2 not defined because of singularities)
                               Estimate Std. Error t value Pr(>|t|)  
(Intercept)                      41.750      9.697   4.305   0.0499 *
CityNew York                    -44.500     11.197  -3.974   0.0579 .
Gender1                           2.750     14.812   0.186   0.8698  
Employment:WorktypeUnemployed        NA         NA      NA       NA  
Employment:WorktypeBluecolor     35.000     17.704   1.977   0.1867  
Employment:WorktypeWhitecolor   102.750     14.812   6.937   0.0202 *
Employment:Holiday1 day a week  -70.000     22.394  -3.126   0.0889 .
Employment:Holiday2 day a week       NA         NA      NA       NA 
Run Code Online (Sandbox Code Playgroud)

我怎样才能忽略 NA 值并无错误地运行 tobit 回归?

下面是可重现的代码。

Happiness <- c(0, 80, 39, 0, 69, 90, 100, 30)

 City <- as.factor(c("New York", "Chicago", "Chicago", "New York", "Chicago", 
"Chicago", "New York", "New York"))
 Gender <- as.factor(c(0, 1, 0, 1, 1, 1, 0, 1)) # 0 = man, 1 = woman.
 Employment <- c(0,1, 0, 0, 1 ,1 , 1 , 1) # 0 = unemployed, 1 = employed.
 Worktype <- as.factor(c(0, 2, 0, 0, 1, 1, 2,2))
 levels(Worktype) <- c("Unemployed", "Bluecolor", "Whitecolor")
 Holiday <- as.factor(c(0, 1, 0, 0, 2, 2, 2, 1))
 levels(Holiday) <- c("Unemployed", "1 day a week", "2 day a week")

 data <- data.frame(Happiness, City, Gender, Employment, Worktype, Holiday)
 reg <- lm(Happiness ~ City + Gender + Employment:Worktype +      
           Employment:Holiday)
 summary(reg)

 install.packages("censReg")
 library(censReg)
 tobitreg <- censReg(Happiness ~ City + Gender + Employment:Worktype +      
                     Employment:Holiday)
 summary(tobitreg)
Run Code Online (Sandbox Code Playgroud)

Wal*_*ldi 3

如果您逐步调试对 censReg 的调用,您将达到以下 maxLik 优化:

\n
result <- maxLik(censRegLogLikCross, start = start, \n      yVec = yVec, xMat = xMat, left = left, right = right, \n      obsBelow = obsBelow, obsBetween = obsBetween, obsAbove = obsAbove, \n      ...)\n
Run Code Online (Sandbox Code Playgroud)\n

正如您已经发现的那样,start使用 OLS 回归确定的初始条件向量包含两个系数:NA

\n
    \n
  • 就业:工作类型失业
  • \n
  • 工作时间:每周休息2天
  • \n
\n

这会导致maxLik返回 NULL,并显示错误消息:

\n
Return code 100: Initial value out of range.\n
Run Code Online (Sandbox Code Playgroud)\n

summary函数获取此信息NULL,这解释了您收到的最终错误消息。

\n

要覆盖它,您可以设置start参数:

\n
tobitreg <- censReg(formula = Happiness ~ City + Gender + Employment:Worktype +      \n                      Employment:Holiday, start = rep(0,9) )\nsummary(tobitreg)\n\nCall:\ncensReg(formula = Happiness ~ City + Gender + Employment:Worktype + \n    Employment:Holiday, start = rep(0, 9))\n\nObservations:\n         Total  Left-censored     Uncensored Right-censored \n             8              2              6              0 \n\nCoefficients:\n                               Estimate Std. error t value Pr(> t)\n(Intercept)                      38.666        Inf       0       1\nCityNew York                    -50.669        Inf       0       1\nGender1                        -360.633        Inf       0       1\nEmployment:WorktypeUnemployed     0.000        Inf       0       1\nEmployment:WorktypeBluecolor    345.674        Inf       0       1\nEmployment:WorktypeWhitecolor    56.210        Inf       0       1\nEmployment:Holiday1 day a week  346.091        Inf       0       1\nEmployment:Holiday2 day a week   55.793        Inf       0       1\nlogSigma                          1.794        Inf       0       1\n\nNewton-Raphson maximisation, 141 iterations\nReturn code 1: gradient close to zero\nLog-likelihood: -19.35431 on 9 Df\n
Run Code Online (Sandbox Code Playgroud)\n

即使错误消息消失,结果也不可靠:

\n
    \n
  • 错误=信息
  • \n
  • 梯度接近0:没有最优值,解是一个超平面
  • \n
\n

回归中的 NA 系数表明这些系数与其他系数线性相关,因此您需要删除其中一些系数才能获得唯一的解决方案。

\n

正如您所怀疑的,其原因是您只有Employement = 0当 时worktype = Unemployed,因此模型无法估计 的系数Employment:WorktypeUnemployed。系数也有同样的问题Employment:Holiday

\n

所以我担心您正在评估的回归模型没有单一的最佳解决方案。

\n

如果你去掉链接变量,这有效:

\n
tobitreg <- censReg(formula = Happiness ~ City + Gender + Employment )\nsummary(tobitreg)\nCall:\ncensReg(formula = Happiness ~ City + Gender + Employment)\n\nObservations:\n         Total  Left-censored     Uncensored Right-censored \n             8              2              6              0 \n\nCoefficients:\n             Estimate Std. error t value  Pr(> t)    \n(Intercept)   38.6141     5.7188   6.752 1.46e-11 ***\nCityNew York -50.1813     6.4885  -7.734 1.04e-14 ***\nGender1      -70.3859     8.2943  -8.486  < 2e-16 ***\nEmployment   111.5672    10.0927  11.054  < 2e-16 ***\nlogSigma       1.7930     0.2837   6.320 2.61e-10 ***\n---\nSignif. codes:  0 \xe2\x80\x98***\xe2\x80\x99 0.001 \xe2\x80\x98**\xe2\x80\x99 0.01 \xe2\x80\x98*\xe2\x80\x99 0.05 \xe2\x80\x98.\xe2\x80\x99 0.1 \xe2\x80\x98 \xe2\x80\x99 1\n\nNewton-Raphson maximisation, 8 iterations\nReturn code 1: gradient close to zero\nLog-likelihood: -19.36113 on 5 Df\n
Run Code Online (Sandbox Code Playgroud)\n