Jad*_*Jad 8 r panel-data emmeans marginal-effects
这是运行单个固定效应方法的三种不同方法,它们给出或多或少相同的结果(见下文)。我的主要问题是如何使用第二个模型 ( model_plm) 或第三个模型 ( model_felm) 获得预测概率或平均边际效应。我知道如何使用第一个模型 ( model_lm) 来做到这一点,并使用下面的示例来展示ggeffects,但这仅在我有一个小样本时才有效。
由于我有超过一百万人,我的模型只能使用model_plm和来工作model_felm。如果我使用model_lm,则需要花费大量时间来运行一百万个人,因为它们是在模型中受到控制的。我还收到以下错误:Error: vector memory exhausted (limit reached?)。我检查了 StackOverflow 上的许多线程来解决该错误,但似乎没有任何解决方案。
我想知道是否有有效的方法来解决这个问题。我的主要兴趣是提取交互的预测概率residence*union。我通常使用以下软件包之一提取预测概率或平均边际效应:ggeffects、emmeans或margins。
library(lfe)
library(plm)
library(ggeffects)
data("Males")
model_lm = lm(wage ~ exper + residence+health + residence*union +factor(nr)-1, data=Males)
model_plm = plm(wage ~ exper + residence + health + residence*union,model = "within", index=c("nr", "year"), data=Males)
model_felm = felm(wage ~ exper + residence + health + residence*union | nr, data= Males)
pred_ggeffects <- ggpredict(model_lm, c("residence","union"),
vcov.fun = "vcovCL",
vcov.type = "HC1",
vcov.args = list(cluster = Males$nr))
Run Code Online (Sandbox Code Playgroud)
我尝试调整公式/数据集以使 emmeans 和 plm 发挥良好作用。如果这里有什么东西请告诉我。经过一些测试后,我意识到 biglm 的答案并不能满足一百万人的需求。
library(emmeans)
library(plm)
data("Males")
## this runs but we need to get an equivalent result with expanded formula
## and expanded dataset
model_plm = plm(wage ~ exper + residence + health + residence*union,model = "within", index=c("nr"), data=Males)
## expanded dataset
Males2 <- data.frame(wage=Males[complete.cases(Males),"wage"],
model.matrix(wage ~ exper + residence + health + residence*union, Males),
nr=Males[complete.cases(Males),"nr"])
(fmla2 <- as.formula(paste("wage ~ ", paste(names(coef(model_plm)), collapse= "+"))))
## expanded formula
model_plm2 <- plm(fmla2,
model = "within",
index=c("nr"),
data=Males2)
(fmla2_rg <- as.formula(paste("wage ~ -1 +", paste(names(coef(model_plm)), collapse= "+"))))
plm2_rg <- qdrg(fmla2_rg,
data = Males2,
coef = coef(model_plm2),
vcov = vcov(model_plm2),
df = model_plm2$df.residual)
plm2_rg
### when all 3 residences are 0, that's `rural area`
### then just pick the rows when one of the residences are 1
emmeans(plm2_rg, c("residencenorth_east","residencenothern_central","residencesouth", "unionyes"))
Run Code Online (Sandbox Code Playgroud)
删除一些行后,给出:
> ### when all 3 residences are 0, that's `rural area`
> ### then just pick the rows when one of the residences are 1
> emmeans(plm2_rg, c("residencenorth_east","residencenothern_central","residencesouth", "unionyes"))
residencenorth_east residencenothern_central residencesouth unionyes emmean SE df lower.CL upper.CL
0 0 0 0 0.3777 0.0335 2677 0.31201 0.443
1 0 0 0 0.3301 0.1636 2677 0.00929 0.651
0 1 0 0 0.1924 0.1483 2677 -0.09834 0.483
0 0 1 0 0.2596 0.1514 2677 -0.03732 0.557
0 0 0 1 0.2875 0.1473 2677 -0.00144 0.576
1 0 0 1 0.3845 0.1647 2677 0.06155 0.708
0 1 0 1 0.3326 0.1539 2677 0.03091 0.634
0 0 1 1 0.3411 0.1534 2677 0.04024 0.642
Results are averaged over the levels of: healthyes
Confidence level used: 0.95
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
826 次 |
| 最近记录: |