尝试在 R 包 lavaan 中重新创建官方 Stata SEM 示例

Question

尝试在 R 包 lavaan 中重新创建官方 Stata SEM 示例

我想用R 包重新创建这个官方 Stata 示例，因为我想知道相同的模型是否会在两个包中产生相同的结果。lavaan

我使用相同的数据（我使用 Stata 上传数据，十个保存为 a 并将.dta其上传到R），并且从图形上看，模型结构看起来相同。我的问题是估计值相差甚远。

尽管如此，我并不完全确定我在两个回归中是否做了同样的事情（也许是因为两种语法得出的假设不同）。

如果我做了什么不同的事情，有人可以向我指出吗？

Stata语法

webuse sem_sm2, clear sem (anomia67 pwless67 <- Alien67) (anomia71 pwless71 <- Alien71) (Alien67 <- SES) (Alien71 <- Alien67 SES) ( SES -> educ occstat66), nolog
Run Code Online (Sandbox Code Playgroud)
R-lavaan 语法

library(haven) library(lavaan) library(lavaanPlot) model <- ' Alien67 =~ anomia67 + pwless67 Alien71 =~ anomia71 + pwless71 Alien67 ~ SES Alien71 ~ Alien67 + SES SES =~ educ66 + occstat66 ' model_fit <- sem(model, data=example) summary(model_fit , standardized=TRUE, fit.measures=TRUE) lavaanPlot(model = model_fit, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = FALSE)
Run Code Online (Sandbox Code Playgroud)
统计输出

Endogenous variables Measurement: anomia67 pwless67 anomia71 pwless71 educ66 occstat66 Latent: Alien67 Alien71 Exogenous variables Latent: SES Structural equation model Number of obs = 932 Estimation method = ml Log likelihood = -15246.469 ( 1) [anomia67]Alien67 = 1 ( 2) [anomia71]Alien71 = 1 ( 3) [educ66]SES = 1 --------------------------------------------------------------------------------- | OIM | Coef. Std. Err. z P>|z| [95% Conf. Interval] ----------------+---------------------------------------------------------------- Structural | Alien67 | SES | -.6140404 .0562407 -10.92 0.000 -.7242701 -.5038107 --------------+---------------------------------------------------------------- Alien71 | Alien67 | .7046342 .0533512 13.21 0.000 .6000678 .8092007 SES | -.1744153 .0542489 -3.22 0.001 -.2807413 -.0680894 ----------------+---------------------------------------------------------------- Measurement | anomia67 | Alien67 | 1 (constrained) _cons | 13.61 .1126205 120.85 0.000 13.38927 13.83073 --------------+---------------------------------------------------------------- pwless67 | Alien67 | .8884887 .0431565 20.59 0.000 .8039034 .9730739 _cons | 14.67 .1001798 146.44 0.000 14.47365 14.86635 --------------+---------------------------------------------------------------- anomia71 | Alien71 | 1 (constrained) _cons | 14.13 .1158943 121.92 0.000 13.90285 14.35715 --------------+---------------------------------------------------------------- pwless71 | Alien71 | .8486022 .0415205 20.44 0.000 .7672235 .9299808 _cons | 14.9 .1034537 144.03 0.000 14.69723 15.10277 --------------+---------------------------------------------------------------- educ66 | SES | 1 (constrained) _cons | 10.9 .1014894 107.40 0.000 10.70108 11.09892 --------------+---------------------------------------------------------------- occstat66 | SES | 5.331259 .4307503 12.38 0.000 4.487004 6.175514 _cons | 37.49 .6947112 53.96 0.000 36.12839 38.85161 ----------------+---------------------------------------------------------------- var(e.anomia67)| 4.009921 .3582978 3.365724 4.777416 var(e.pwless67)| 3.187468 .283374 2.677762 3.794197 var(e.anomia71)| 3.695593 .3911512 3.003245 4.54755 var(e.pwless71)| 3.621531 .3037908 3.072483 4.268693 var(e.educ66)| 2.943819 .5002527 2.109908 4.107319 var(e.occstat66)| 260.63 18.24572 227.2139 298.9605 var(e.Alien67)| 5.301416 .483144 4.434225 6.338201 var(e.Alien71)| 3.737286 .3881546 3.048951 4.581019 var(SES)| 6.65587 .6409484 5.511067 8.038482 --------------------------------------------------------------------------------- LR test of model vs. saturated: chi2(6) = 71.62, Prob > chi2 = 0.0000
Run Code Online (Sandbox Code Playgroud)
R-拉万

lavaan 0.6-11 ended normally after 149 iterations Estimator ML Optimization method NLMINB Number of model parameters 15 Number of observations 15 Model Test User Model: Test statistic 26.531 Degrees of freedom 6 P-value (Chi-square) 0.000 Model Test Baseline Model: Test statistic 370.547 Degrees of freedom 15 P-value 0.000 User Model versus Baseline Model: Comparative Fit Index (CFI) 0.942 Tucker-Lewis Index (TLI) 0.856 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -82.504 Loglikelihood unrestricted model (H1) -69.238 Akaike (AIC) 195.008 Bayesian (BIC) 205.629 Sample-size adjusted Bayesian (BIC) 159.836 Root Mean Square Error of Approximation: RMSEA 0.478 90 Percent confidence interval - lower 0.301 90 Percent confidence interval - upper 0.670 P-value RMSEA <= 0.05 0.000 Standardized Root Mean Square Residual: SRMR 0.002 Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model Structured Latent Variables: Estimate Std.Err z-value P(>|z|) Std.lv Std.all Alien67 =~ anomia67 1.000 3.388 0.999 pwless67 1.072 0.016 66.835 0.000 3.632 0.999 Alien71 =~ anomia71 1.000 3.520 0.999 pwless71 1.049 0.014 74.224 0.000 3.694 0.999 SES =~ educ66 1.000 2.794 0.993 occstat66 3.608 0.267 13.528 0.000 10.081 0.969 Regressions: Estimate Std.Err z-value P(>|z|) Std.lv Std.all Alien67 ~ SES 1.176 0.087 13.498 0.000 0.970 0.970 Alien71 ~ Alien67 0.996 0.060 16.505 0.000 0.959 0.959 SES 0.053 0.073 0.727 0.467 0.042 0.042 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .anomia67 0.018 0.011 1.642 0.101 0.018 0.002 .pwless67 0.024 0.013 1.810 0.070 0.024 0.002 .anomia71 0.013 0.009 1.359 0.174 0.013 0.001 .pwless71 0.023 0.012 1.875 0.061 0.023 0.002 .educ66 0.104 0.153 0.680 0.497 0.104 0.013 .occstat66 6.514 3.063 2.127 0.033 6.514 0.060 .Alien67 0.678 0.327 2.074 0.038 0.059 0.059 .Alien71 0.012 0.012 0.998 0.318 0.001 0.001 SES 7.808 2.892 2.699 0.007 1.000 1.000
Run Code Online (Sandbox Code Playgroud)

编辑：

sem_sm2 <- structure(list(`_group` = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), format.stata = "%8.0g"), `_type` = structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15), format.stata = "%8.0g"), educ66 = structure(c(10.9, 3.1, 1, 0.54, -0.34, -0.41, -0.25, 0.53, -0.36, -0.41, -0.16, 0.54, -0.35, -0.37, -0.22), label = "Education, 1966", format.stata = "%10.0g"), occstat66 = structure(c(37.49, 21.22, 0.54, 1, -0.25, -0.3, -0.21, 0.82, -0.3, -0.29, -0.19, 0.81, -0.29, -0.28, -0.16 ), label = "Occupational status, 1966", format.stata = "%10.0g"), anomia66 = structure(c(13.7, 3.6, -0.34, -0.25, 1, 0.7, 0.27, -0.26, 0.67, 0.55, 0.27, -0.25, 0.53, 0.42, 0.26), label = "Anomia, 1966", format.stata = "%10.0g"), pwless66 = structure(c(14.85, 3.31, -0.41, -0.3, 0.7, 1, 0.31, -0.31, 0.55, 0.63, 0.26, -0.25, 0.5, 0.51, 0.26), label = "Powerlessness, 1966", format.stata = "%10.0g"), socdist66 = structure(c(1.11, 1.4, -0.25, -0.21, 0.27, 0.31, 1, -0.14, 0.26, 0.28, 0.49, -0.21, 0.27, 0.24, 0.47), label = "Latin American social distance, 1966", format.stata = "%10.0g"), occstat67 = structure(c(36.72, 21, 0.53, 0.82, -0.26, -0.31, -0.14, 1, -0.3, -0.33, -0.11, 0.78, -0.32, -0.26, -0.14), label = "Occupational status, 1967", format.stata = "%10.0g"), anomia67 = structure(c(13.61, 3.44, -0.36, -0.3, 0.67, 0.55, 0.26, -0.3, 1, 0.66, 0.25, -0.28, 0.56, 0.44, 0.25), label = "Anomia, 1967", format.stata = "%10.0g"), pwless67 = structure(c(14.67, 3.06, -0.41, -0.29, 0.55, 0.63, 0.28, -0.33, 0.66, 1, 0.28, -0.26, 0.47, 0.52, 0.27), label = "Powerlessness, 1967", format.stata = "%10.0g"), socdist67 = structure(c(0.89, 1.27, -0.16, -0.19, 0.27, 0.26, 0.49, -0.11, 0.25, 0.28, 1, -0.18, 0.29, 0.23, 0.41), label = "Latin American social distance, 1967", format.stata = "%10.0g"), occstat71 = structure(c(37.47, 20.98, 0.54, 0.81, -0.25, -0.25, -0.21, 0.78, -0.28, -0.26, -0.18, 1, -0.31, -0.28, -0.19), label = "Occupational status, 1971", format.stata = "%10.0g"), anomia71 = structure(c(14.13, 3.54, -0.35, -0.29, 0.53, 0.5, 0.27, -0.32, 0.56, 0.47, 0.29, -0.31, 1, 0.67, 0.29), label = "Anomia, 1971", format.stata = "%10.0g"), pwless71 = structure(c(14.9, 3.16, -0.37, -0.28, 0.42, 0.51, 0.24, -0.26, 0.44, 0.52, 0.23, -0.28, 0.67, 1, 0.28), label = "Powerlessness, 1971", format.stata = "%10.0g"), socdist71 = structure(c(0.78, 1.25, -0.22, -0.16, 0.26, 0.26, 0.47, -0.14, 0.25, 0.27, 0.41, -0.19, 0.29, 0.28, 1), label = "Latin American social distance, 1971", format.stata = "%10.0g")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -15L), label = "Structural model with measurement component", notes = c("Summary statistics data from Wheaton, B., Muthen B., Alwin, D., & Summers, G., 1977, \"Assessing reliability and stability in panel models\", in D. R. Heise (Ed.), _Sociological Methodology 1977_ (pp. 84-136), San Francisco: Jossey-Bass, Inc.", "Intended use: Create structural model relating Alienation in 1971, Alienation in 1967, and SES in 1966.", "3", "Four indicators each measured in 1966, 1967, and 1981, plus another indicator (educ66) measured only in 1966." ))
Run Code Online (Sandbox Code Playgroud)

Answer 1

小智 3

Assuming you are using haven to read this .dta file then you are using sample data that are not intended for use in analysis, which is clearly stated on this page: https://www.stata-press.com/data/r17/sem.html

Datasets used in the Stata documentation were selected to demonstrate how to use Stata. Some datasets have been altered to explain a particular feature. Do not use these datasets for analysis.

不幸的是，样本数据仅包括本例中显示的 932 个观测值中的 15 个

x <- haven::read_dta("https://www.stata-press.com/data/r17/sem_sm2.dta")
sum(sapply(x, length))
#> [1] 225
sum(nrow(x))
#> [1] 15

Run Code Online (Sandbox Code Playgroud)

归档时间：	3 年，6 月前
查看次数：	316 次
最近记录：	3 年，6 月前