我正在尝试估计一个标准的 tobit 模型,该模型被审查为零。
变量是
因变量:幸福
自变量:
“Worktype”和“Holiday”变量与“Employment”变量相互作用。
我正在使用censReg包进行 tobit 回归。
censReg(Happiness ~ City + Gender + Employment:Worktype + Employment:Holiday)
Run Code Online (Sandbox Code Playgroud)
但summary()返回以下错误。
Error in printCoefmat(coef(x, logSigma = logSigma), digits = digits) :
'x' must be coefficient matrix/data frame
Run Code Online (Sandbox Code Playgroud)
为了找出原因,我运行了 OLS 回归。
有一些 NA 值,我认为这是因为模型设计和变量设置(某些变量似乎有奇点。而'Employment' = 0有值的人'Worktype' = Unemployed,'Holidays' = Unemployed。这可能是原因?)
lm(Happiness ~ City + Gender + Employment:Worktype + Employment:Holiday)
Coefficients: (2 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 41.750 9.697 4.305 0.0499 *
CityNew York -44.500 11.197 -3.974 0.0579 .
Gender1 2.750 14.812 0.186 0.8698
Employment:WorktypeUnemployed NA NA NA NA
Employment:WorktypeBluecolor 35.000 17.704 1.977 0.1867
Employment:WorktypeWhitecolor 102.750 14.812 6.937 0.0202 *
Employment:Holiday1 day a week -70.000 22.394 -3.126 0.0889 .
Employment:Holiday2 day a week NA NA NA NA
Run Code Online (Sandbox Code Playgroud)
我怎样才能忽略 NA 值并无错误地运行 tobit 回归?
下面是可重现的代码。
Happiness <- c(0, 80, 39, 0, 69, 90, 100, 30)
City <- as.factor(c("New York", "Chicago", "Chicago", "New York", "Chicago",
"Chicago", "New York", "New York"))
Gender <- as.factor(c(0, 1, 0, 1, 1, 1, 0, 1)) # 0 = man, 1 = woman.
Employment <- c(0,1, 0, 0, 1 ,1 , 1 , 1) # 0 = unemployed, 1 = employed.
Worktype <- as.factor(c(0, 2, 0, 0, 1, 1, 2,2))
levels(Worktype) <- c("Unemployed", "Bluecolor", "Whitecolor")
Holiday <- as.factor(c(0, 1, 0, 0, 2, 2, 2, 1))
levels(Holiday) <- c("Unemployed", "1 day a week", "2 day a week")
data <- data.frame(Happiness, City, Gender, Employment, Worktype, Holiday)
reg <- lm(Happiness ~ City + Gender + Employment:Worktype +
Employment:Holiday)
summary(reg)
install.packages("censReg")
library(censReg)
tobitreg <- censReg(Happiness ~ City + Gender + Employment:Worktype +
Employment:Holiday)
summary(tobitreg)
Run Code Online (Sandbox Code Playgroud)
如果您逐步调试对 censReg 的调用,您将达到以下 maxLik 优化:
\nresult <- maxLik(censRegLogLikCross, start = start, \n yVec = yVec, xMat = xMat, left = left, right = right, \n obsBelow = obsBelow, obsBetween = obsBetween, obsAbove = obsAbove, \n ...)\nRun Code Online (Sandbox Code Playgroud)\n正如您已经发现的那样,start使用 OLS 回归确定的初始条件向量包含两个系数:NA
这会导致maxLik返回 NULL,并显示错误消息:
Return code 100: Initial value out of range.\nRun Code Online (Sandbox Code Playgroud)\n该summary函数获取此信息NULL,这解释了您收到的最终错误消息。
要覆盖它,您可以设置start参数:
tobitreg <- censReg(formula = Happiness ~ City + Gender + Employment:Worktype + \n Employment:Holiday, start = rep(0,9) )\nsummary(tobitreg)\n\nCall:\ncensReg(formula = Happiness ~ City + Gender + Employment:Worktype + \n Employment:Holiday, start = rep(0, 9))\n\nObservations:\n Total Left-censored Uncensored Right-censored \n 8 2 6 0 \n\nCoefficients:\n Estimate Std. error t value Pr(> t)\n(Intercept) 38.666 Inf 0 1\nCityNew York -50.669 Inf 0 1\nGender1 -360.633 Inf 0 1\nEmployment:WorktypeUnemployed 0.000 Inf 0 1\nEmployment:WorktypeBluecolor 345.674 Inf 0 1\nEmployment:WorktypeWhitecolor 56.210 Inf 0 1\nEmployment:Holiday1 day a week 346.091 Inf 0 1\nEmployment:Holiday2 day a week 55.793 Inf 0 1\nlogSigma 1.794 Inf 0 1\n\nNewton-Raphson maximisation, 141 iterations\nReturn code 1: gradient close to zero\nLog-likelihood: -19.35431 on 9 Df\nRun Code Online (Sandbox Code Playgroud)\n即使错误消息消失,结果也不可靠:
\n回归中的 NA 系数表明这些系数与其他系数线性相关,因此您需要删除其中一些系数才能获得唯一的解决方案。
\n正如您所怀疑的,其原因是您只有Employement = 0当 时worktype = Unemployed,因此模型无法估计 的系数Employment:WorktypeUnemployed。系数也有同样的问题Employment:Holiday。
所以我担心您正在评估的回归模型没有单一的最佳解决方案。
\n如果你去掉链接变量,这有效:
\ntobitreg <- censReg(formula = Happiness ~ City + Gender + Employment )\nsummary(tobitreg)\nCall:\ncensReg(formula = Happiness ~ City + Gender + Employment)\n\nObservations:\n Total Left-censored Uncensored Right-censored \n 8 2 6 0 \n\nCoefficients:\n Estimate Std. error t value Pr(> t) \n(Intercept) 38.6141 5.7188 6.752 1.46e-11 ***\nCityNew York -50.1813 6.4885 -7.734 1.04e-14 ***\nGender1 -70.3859 8.2943 -8.486 < 2e-16 ***\nEmployment 111.5672 10.0927 11.054 < 2e-16 ***\nlogSigma 1.7930 0.2837 6.320 2.61e-10 ***\n---\nSignif. codes: 0 \xe2\x80\x98***\xe2\x80\x99 0.001 \xe2\x80\x98**\xe2\x80\x99 0.01 \xe2\x80\x98*\xe2\x80\x99 0.05 \xe2\x80\x98.\xe2\x80\x99 0.1 \xe2\x80\x98 \xe2\x80\x99 1\n\nNewton-Raphson maximisation, 8 iterations\nReturn code 1: gradient close to zero\nLog-likelihood: -19.36113 on 5 Df\nRun Code Online (Sandbox Code Playgroud)\n