Oph*_*lia 4 r cluster-analysis standard-error lm
我正在逐年运行带有聚集标准误差的回归。这很容易用 Stata 完成,但我必须用 R 来完成,所以我使用包中的lm_robust()函数运行它estimatr。问题是我现在必须得到一些变量的边际效应,但我不能这样做,我猜这是因为集群标准错误。我遵循了手册上的内容lm_robust(),我看到他们只将 margins 包中的 margins 命令用于其他功能而没有聚集的标准错误......有没有人知道我如何获得和绘制边际效应?
set.seed(42)
library(fabricatr)
library(randomizr)
dat <- fabricate(
N = 100, # sample size
x = runif(N, 0, 1), # pre-treatment covariate
y0 = rnorm(N, mean = x), # control potential outcome
y1 = y0 + 0.35, # treatment potential outcome
z = complete_ra(N), # complete random assignment to treatment
y = ifelse(z, y1, y0), # observed outcome
# We will also consider clustered data
clust = sample(rep(letters[1:20], each = 5)),
z_clust = cluster_ra(clust),
y_clust = ifelse(z_clust, y1, y0)
)
Run Code Online (Sandbox Code Playgroud)
然后当我使用lm_robust()函数运行回归时:
library(estimatr)
lmout_cl <- lm_robust(
y_clust ~ z_clust + x,
data = dat,
clusters = clust
)
Run Code Online (Sandbox Code Playgroud)
最后,我试图获得利润......
library(margins)
mar_cl <- margins(lmout_cl)
Run Code Online (Sandbox Code Playgroud)
但这会导致错误:
set.seed(42)
library(fabricatr)
library(randomizr)
dat <- fabricate(
N = 100, # sample size
x = runif(N, 0, 1), # pre-treatment covariate
y0 = rnorm(N, mean = x), # control potential outcome
y1 = y0 + 0.35, # treatment potential outcome
z = complete_ra(N), # complete random assignment to treatment
y = ifelse(z, y1, y0), # observed outcome
# We will also consider clustered data
clust = sample(rep(letters[1:20], each = 5)),
z_clust = cluster_ra(clust),
y_clust = ifelse(z_clust, y1, y0)
)
Run Code Online (Sandbox Code Playgroud)
对于这个错误道歉,防止margins()从工作lm_robust()与非数字集群对象estimatr0.10及更早版本。这是由内部方式创建的,estimatr::lm_robust()并margins::margins()处理模型中的变量。
该错误已被解决,因此您在estimatr.
让我先生成数据。
library(fabricatr)
library(randomizr)
dat <- fabricate(
N = 100,
x = runif(N),
clust = sample(rep(letters[1:20], each = 5)),
y_clust = rnorm(N),
z_clust = cluster_ra(clust),
)
Run Code Online (Sandbox Code Playgroud)
获取最新版本estimatr(v0.11.0)
https://declaredesign.org/r/estimatr上的开发版本对此错误进行了修复,并将在下个月左右发布在 CRAN 上。
install.packages("estimatr", dependencies = TRUE,
repos = c("http://r.declaredesign.org", "https://cloud.r-project.org"))
library(estimatr)
lmout_cl <- lm_robust(
y_clust ~ z_clust + x,
data = dat,
clusters = clust
)
library(margins)
mar_cl <- margins(lmout_cl)
Run Code Online (Sandbox Code Playgroud)
使用 CRAN 版本estimatr(v0.10.0) 的数字集群
现有版本的estimatrCRAN 的解决方法是使用数字簇而不是字符簇
dat <- fabricate(
N = 100,
x = runif(N),
clust = sample(rep(1:20, each = 5)),
y_clust = rnorm(N),
z_clust = cluster_ra(clust),
)
install.packages("estimatr")
library(estimatr)
lmout_cl <- lm_robust(
y_clust ~ z_clust + x,
data = dat,
clusters = clust
)
mar_cl <- margins(lmout_cl)
Run Code Online (Sandbox Code Playgroud)
问题是estimatr::lm_robust()产生一个目前"lm_robust"似乎不受支持的对象。margins()我们可以使用 Statamiceadds::lm.cluster()来代替,并获得与 Stata 相同的聚类标准误差。
library(miceadds)
lmout_cl <- lm.cluster(y_clust ~ z_clust + x, data=dat, cluster=dat$clust)
Run Code Online (Sandbox Code Playgroud)
这会产生一个包含两个元素的列表,其中法线lm对象存储在第一个元素中,而具有聚集标准误差的方差-协方差矩阵存储在第二个元素中(请参阅str(lmout_cl)):
> names(lmout_cl)
[1] "lm_res" "vcov"
Run Code Online (Sandbox Code Playgroud)
margins()现在可以指定为margins(model=model, vcov=vcov),所以我们说:
mar_cl <- with(lmout_cl, margins(lm_res, vcov=vcov))
Run Code Online (Sandbox Code Playgroud)
屈服
> mar_cl
Average marginal effects
stats::lm(formula = formula, data = data)
z_clust x
0.6558 1.444
Run Code Online (Sandbox Code Playgroud)
和
> summary(mar_cl)
factor AME SE z p lower upper
x 1.4445 0.3547 4.0728 0.0000 0.7494 2.1396
z_clust 0.6558 0.1950 3.3633 0.0008 0.2736 1.0379
Run Code Online (Sandbox Code Playgroud)
具有聚集的标准错误。
与Stata的比较
右
foreign::write.dta(dat, "dat.dta") # export as Stata data to wd
Run Code Online (Sandbox Code Playgroud)
斯塔塔
. use dat, clear
(Written by R. )
. quietly regress y_clust z_clust x, vce(cluster clust)
. mfx
Marginal effects after regress
y = Fitted values (predict)
= .67420391
------------------------------------------------------------------------------
variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
---------+--------------------------------------------------------------------
z_clust*| .6557558 .19498 3.36 0.001 .273609 1.0379 .5
x | 1.444481 .35466 4.07 0.000 .749352 2.13961 .524479
------------------------------------------------------------------------------
(*) dy/dx is for discrete change of dummy variable from 0 to 1
.
Run Code Online (Sandbox Code Playgroud)
正如我们可以清楚地看到的那样,在聚类标准误差和边际效应方面,R 的结果与 Stata 相同。