如何使用 R 运行面板数据中个体固定效应的预测概率(或平均边际效应)?

Jad*_*Jad 8 r panel-data emmeans marginal-effects

这是运行单个固定效应方法的三种不同方法,它们给出或多或少相同的结果(见下文)。我的主要问题是如何使用第二个模型 ( model_plm) 或第三个模型 ( model_felm) 获得预测概率或平均边际效应。我知道如何使用第一个模型 ( model_lm) 来做到这一点,并使用下面的示例来展示ggeffects,但这仅在我有一个小样本时才有效。

由于我有超过一百万人,我的模型只能使用model_plm和来工作model_felm。如果我使用model_lm,则需要花费大量时间来运行一百万个人,因为它们是在模型中受到控制的。我还收到以下错误:Error: vector memory exhausted (limit reached?)。我检查了 StackOverflow 上的许多线程来解决该错误,但似乎没有任何解决方案。

我想知道是否有有效的方法来解决这个问题。我的主要兴趣是提取交互的预测概率residence*union。我通常使用以下软件包之一提取预测概率或平均边际效应:ggeffectsemmeansmargins

library(lfe)
library(plm)
library(ggeffects)
data("Males")  

model_lm = lm(wage ~ exper + residence+health + residence*union +factor(nr)-1, data=Males)
model_plm = plm(wage ~ exper + residence + health + residence*union,model = "within", index=c("nr", "year"), data=Males)
model_felm = felm(wage ~ exper + residence + health + residence*union | nr, data= Males)

pred_ggeffects <- ggpredict(model_lm, c("residence","union"), 
                    vcov.fun = "vcovCL", 
                    vcov.type = "HC1",
                    vcov.args = list(cluster = Males$nr))
Run Code Online (Sandbox Code Playgroud)

swi*_*art 2

我尝试调整公式/数据集以使 emmeans 和 plm 发挥良好作用。如果这里有什么东西请告诉我。经过一些测试后,我意识到 biglm 的答案并不能满足一百万人的需求。

library(emmeans)
library(plm)
data("Males")  

## this runs but we need to get an equivalent result with expanded formula
## and expanded dataset
model_plm = plm(wage ~ exper + residence + health + residence*union,model = "within", index=c("nr"), data=Males)

## expanded dataset
Males2 <- data.frame(wage=Males[complete.cases(Males),"wage"],
                     model.matrix(wage ~ exper + residence + health + residence*union, Males), 
                     nr=Males[complete.cases(Males),"nr"])


(fmla2 <- as.formula(paste("wage ~ ", paste(names(coef(model_plm)), collapse= "+"))))

## expanded formula
model_plm2 <- plm(fmla2,
                  model = "within",
                  index=c("nr"), 
                  data=Males2)

(fmla2_rg <- as.formula(paste("wage ~ -1 +", paste(names(coef(model_plm)), collapse= "+"))))

plm2_rg <- qdrg(fmla2_rg,
                data = Males2,
                coef = coef(model_plm2),
                vcov = vcov(model_plm2),
                df = model_plm2$df.residual)

plm2_rg

### when all 3 residences are 0, that's `rural area`
### then just pick the rows when one of the residences are 1
emmeans(plm2_rg, c("residencenorth_east","residencenothern_central","residencesouth", "unionyes"))
Run Code Online (Sandbox Code Playgroud)

删除一些行后,给出:

> ### when all 3 residences are 0, that's `rural area`
> ### then just pick the rows when one of the residences are 1
> emmeans(plm2_rg, c("residencenorth_east","residencenothern_central","residencesouth", "unionyes"))
 residencenorth_east residencenothern_central residencesouth unionyes emmean     SE   df lower.CL upper.CL
                   0                        0              0        0 0.3777 0.0335 2677  0.31201    0.443
                   1                        0              0        0 0.3301 0.1636 2677  0.00929    0.651
                   0                        1              0        0 0.1924 0.1483 2677 -0.09834    0.483
                   0                        0              1        0 0.2596 0.1514 2677 -0.03732    0.557
                   0                        0              0        1 0.2875 0.1473 2677 -0.00144    0.576
                   1                        0              0        1 0.3845 0.1647 2677  0.06155    0.708
                   0                        1              0        1 0.3326 0.1539 2677  0.03091    0.634
                   0                        0              1        1 0.3411 0.1534 2677  0.04024    0.642

Results are averaged over the levels of: healthyes 
Confidence level used: 0.95
Run Code Online (Sandbox Code Playgroud)