我想用R 包重新创建这个官方 Stata 示例,因为我想知道相同的模型是否会在两个包中产生相同的结果。lavaan
我使用相同的数据(我使用 Stata 上传数据,十个保存为 a 并将.dta其上传到R),并且从图形上看,模型结构看起来相同。我的问题是估计值相差甚远。
尽管如此,我并不完全确定我在两个回归中是否做了同样的事情(也许是因为两种语法得出的假设不同)。
如果我做了什么不同的事情,有人可以向我指出吗?
webuse sem_sm2, clear
sem (anomia67 pwless67 <- Alien67) (anomia71 pwless71 <- Alien71) (Alien67 <- SES) (Alien71 <- Alien67 SES) ( SES -> educ occstat66), nolog
Run Code Online (Sandbox Code Playgroud)
library(haven)
library(lavaan)
library(lavaanPlot)
model <- '
Alien67 =~ anomia67 + pwless67
Alien71 =~ anomia71 + pwless71
Alien67 ~ SES
Alien71 ~ Alien67 + SES
SES =~ educ66 + occstat66
'
model_fit <- sem(model, data=example)
summary(model_fit , standardized=TRUE, fit.measures=TRUE)
lavaanPlot(model = model_fit, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = FALSE)
Run Code Online (Sandbox Code Playgroud)
Endogenous variables
Measurement: anomia67 pwless67 anomia71 pwless71 educ66 occstat66
Latent: Alien67 Alien71
Exogenous variables
Latent: SES
Structural equation model Number of obs = 932
Estimation method = ml
Log likelihood = -15246.469
( 1) [anomia67]Alien67 = 1
( 2) [anomia71]Alien71 = 1
( 3) [educ66]SES = 1
---------------------------------------------------------------------------------
| OIM
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
Structural |
Alien67 |
SES | -.6140404 .0562407 -10.92 0.000 -.7242701 -.5038107
--------------+----------------------------------------------------------------
Alien71 |
Alien67 | .7046342 .0533512 13.21 0.000 .6000678 .8092007
SES | -.1744153 .0542489 -3.22 0.001 -.2807413 -.0680894
----------------+----------------------------------------------------------------
Measurement |
anomia67 |
Alien67 | 1 (constrained)
_cons | 13.61 .1126205 120.85 0.000 13.38927 13.83073
--------------+----------------------------------------------------------------
pwless67 |
Alien67 | .8884887 .0431565 20.59 0.000 .8039034 .9730739
_cons | 14.67 .1001798 146.44 0.000 14.47365 14.86635
--------------+----------------------------------------------------------------
anomia71 |
Alien71 | 1 (constrained)
_cons | 14.13 .1158943 121.92 0.000 13.90285 14.35715
--------------+----------------------------------------------------------------
pwless71 |
Alien71 | .8486022 .0415205 20.44 0.000 .7672235 .9299808
_cons | 14.9 .1034537 144.03 0.000 14.69723 15.10277
--------------+----------------------------------------------------------------
educ66 |
SES | 1 (constrained)
_cons | 10.9 .1014894 107.40 0.000 10.70108 11.09892
--------------+----------------------------------------------------------------
occstat66 |
SES | 5.331259 .4307503 12.38 0.000 4.487004 6.175514
_cons | 37.49 .6947112 53.96 0.000 36.12839 38.85161
----------------+----------------------------------------------------------------
var(e.anomia67)| 4.009921 .3582978 3.365724 4.777416
var(e.pwless67)| 3.187468 .283374 2.677762 3.794197
var(e.anomia71)| 3.695593 .3911512 3.003245 4.54755
var(e.pwless71)| 3.621531 .3037908 3.072483 4.268693
var(e.educ66)| 2.943819 .5002527 2.109908 4.107319
var(e.occstat66)| 260.63 18.24572 227.2139 298.9605
var(e.Alien67)| 5.301416 .483144 4.434225 6.338201
var(e.Alien71)| 3.737286 .3881546 3.048951 4.581019
var(SES)| 6.65587 .6409484 5.511067 8.038482
---------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(6) = 71.62, Prob > chi2 = 0.0000
Run Code Online (Sandbox Code Playgroud)
lavaan 0.6-11 ended normally after 149 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 15
Number of observations 15
Model Test User Model:
Test statistic 26.531
Degrees of freedom 6
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 370.547
Degrees of freedom 15
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.942
Tucker-Lewis Index (TLI) 0.856
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -82.504
Loglikelihood unrestricted model (H1) -69.238
Akaike (AIC) 195.008
Bayesian (BIC) 205.629
Sample-size adjusted Bayesian (BIC) 159.836
Root Mean Square Error of Approximation:
RMSEA 0.478
90 Percent confidence interval - lower 0.301
90 Percent confidence interval - upper 0.670
P-value RMSEA <= 0.05 0.000
Standardized Root Mean Square Residual:
SRMR 0.002
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Alien67 =~
anomia67 1.000 3.388 0.999
pwless67 1.072 0.016 66.835 0.000 3.632 0.999
Alien71 =~
anomia71 1.000 3.520 0.999
pwless71 1.049 0.014 74.224 0.000 3.694 0.999
SES =~
educ66 1.000 2.794 0.993
occstat66 3.608 0.267 13.528 0.000 10.081 0.969
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Alien67 ~
SES 1.176 0.087 13.498 0.000 0.970 0.970
Alien71 ~
Alien67 0.996 0.060 16.505 0.000 0.959 0.959
SES 0.053 0.073 0.727 0.467 0.042 0.042
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.anomia67 0.018 0.011 1.642 0.101 0.018 0.002
.pwless67 0.024 0.013 1.810 0.070 0.024 0.002
.anomia71 0.013 0.009 1.359 0.174 0.013 0.001
.pwless71 0.023 0.012 1.875 0.061 0.023 0.002
.educ66 0.104 0.153 0.680 0.497 0.104 0.013
.occstat66 6.514 3.063 2.127 0.033 6.514 0.060
.Alien67 0.678 0.327 2.074 0.038 0.059 0.059
.Alien71 0.012 0.012 0.998 0.318 0.001 0.001
SES 7.808 2.892 2.699 0.007 1.000 1.000
Run Code Online (Sandbox Code Playgroud)
sem_sm2 <- structure(list(`_group` = structure(c(1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1), format.stata = "%8.0g"), `_type` = structure(c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15), format.stata = "%8.0g"),
educ66 = structure(c(10.9, 3.1, 1, 0.54, -0.34, -0.41, -0.25,
0.53, -0.36, -0.41, -0.16, 0.54, -0.35, -0.37, -0.22), label = "Education, 1966", format.stata = "%10.0g"),
occstat66 = structure(c(37.49, 21.22, 0.54, 1, -0.25, -0.3,
-0.21, 0.82, -0.3, -0.29, -0.19, 0.81, -0.29, -0.28, -0.16
), label = "Occupational status, 1966", format.stata = "%10.0g"),
anomia66 = structure(c(13.7, 3.6, -0.34, -0.25, 1, 0.7, 0.27,
-0.26, 0.67, 0.55, 0.27, -0.25, 0.53, 0.42, 0.26), label = "Anomia, 1966", format.stata = "%10.0g"),
pwless66 = structure(c(14.85, 3.31, -0.41, -0.3, 0.7, 1,
0.31, -0.31, 0.55, 0.63, 0.26, -0.25, 0.5, 0.51, 0.26), label = "Powerlessness, 1966", format.stata = "%10.0g"),
socdist66 = structure(c(1.11, 1.4, -0.25, -0.21, 0.27, 0.31,
1, -0.14, 0.26, 0.28, 0.49, -0.21, 0.27, 0.24, 0.47), label = "Latin American social distance, 1966", format.stata = "%10.0g"),
occstat67 = structure(c(36.72, 21, 0.53, 0.82, -0.26, -0.31,
-0.14, 1, -0.3, -0.33, -0.11, 0.78, -0.32, -0.26, -0.14), label = "Occupational status, 1967", format.stata = "%10.0g"),
anomia67 = structure(c(13.61, 3.44, -0.36, -0.3, 0.67, 0.55,
0.26, -0.3, 1, 0.66, 0.25, -0.28, 0.56, 0.44, 0.25), label = "Anomia, 1967", format.stata = "%10.0g"),
pwless67 = structure(c(14.67, 3.06, -0.41, -0.29, 0.55, 0.63,
0.28, -0.33, 0.66, 1, 0.28, -0.26, 0.47, 0.52, 0.27), label = "Powerlessness, 1967", format.stata = "%10.0g"),
socdist67 = structure(c(0.89, 1.27, -0.16, -0.19, 0.27, 0.26,
0.49, -0.11, 0.25, 0.28, 1, -0.18, 0.29, 0.23, 0.41), label = "Latin American social distance, 1967", format.stata = "%10.0g"),
occstat71 = structure(c(37.47, 20.98, 0.54, 0.81, -0.25,
-0.25, -0.21, 0.78, -0.28, -0.26, -0.18, 1, -0.31, -0.28,
-0.19), label = "Occupational status, 1971", format.stata = "%10.0g"),
anomia71 = structure(c(14.13, 3.54, -0.35, -0.29, 0.53, 0.5,
0.27, -0.32, 0.56, 0.47, 0.29, -0.31, 1, 0.67, 0.29), label = "Anomia, 1971", format.stata = "%10.0g"),
pwless71 = structure(c(14.9, 3.16, -0.37, -0.28, 0.42, 0.51,
0.24, -0.26, 0.44, 0.52, 0.23, -0.28, 0.67, 1, 0.28), label = "Powerlessness, 1971", format.stata = "%10.0g"),
socdist71 = structure(c(0.78, 1.25, -0.22, -0.16, 0.26, 0.26,
0.47, -0.14, 0.25, 0.27, 0.41, -0.19, 0.29, 0.28, 1), label = "Latin American social distance, 1971", format.stata = "%10.0g")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -15L), label = "Structural model with measurement component", notes = c("Summary statistics data from Wheaton, B., Muthen B., Alwin, D., & Summers, G., 1977, \"Assessing reliability and stability in panel models\", in D. R. Heise (Ed.), _Sociological Methodology 1977_ (pp. 84-136), San Francisco: Jossey-Bass, Inc.",
"Intended use: Create structural model relating Alienation in 1971, Alienation in 1967, and SES in 1966.",
"3", "Four indicators each measured in 1966, 1967, and 1981, plus another indicator (educ66) measured only in 1966."
))
Run Code Online (Sandbox Code Playgroud)
小智 3
Assuming you are using haven to read this .dta file then you are using sample data that are not intended for use in analysis, which is clearly stated on this page: https://www.stata-press.com/data/r17/sem.html
Datasets used in the Stata documentation were selected to demonstrate how to use Stata. Some datasets have been altered to explain a particular feature. Do not use these datasets for analysis.
不幸的是,样本数据仅包括本例中显示的 932 个观测值中的 15 个
x <- haven::read_dta("https://www.stata-press.com/data/r17/sem_sm2.dta")
sum(sapply(x, length))
#> [1] 225
sum(nrow(x))
#> [1] 15
Run Code Online (Sandbox Code Playgroud)