尝试在 R 包 lavaan 中重新创建官方 Stata SEM 示例

Tom*_*Tom 5 r stata

我想用R 包重新创建这个官方 Stata 示例因为我想知道相同的模型是否会在两个包中产生相同的结果lavaan

我使用相同的数据(我使用 Stata 上传数据,十个保存为 a 并将.dta其上传到R),并且从图形上看,模型结构看起来相同。我的问题是估计值相差甚远。

尽管如此,我并不完全确定我在两个回归中是否做了同样的事情(也许是因为两种语法得出的假设不同)。

如果我做了什么不同的事情,有人可以向我指出吗?

Stata语法

webuse sem_sm2, clear
sem (anomia67 pwless67 <- Alien67) (anomia71 pwless71 <- Alien71) (Alien67 <- SES) (Alien71 <- Alien67 SES) (    SES -> educ occstat66), nolog 
Run Code Online (Sandbox Code Playgroud)

R-lavaan 语法

library(haven)
library(lavaan)
library(lavaanPlot)


model <- '
Alien67 =~ anomia67 + pwless67 
Alien71  =~ anomia71 + pwless71 
Alien67 ~ SES 
Alien71 ~ Alien67 + SES     
SES =~ educ66 + occstat66
'
model_fit <- sem(model, data=example)
summary(model_fit , standardized=TRUE, fit.measures=TRUE)

lavaanPlot(model = model_fit, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = FALSE)
Run Code Online (Sandbox Code Playgroud)

统计输出

Endogenous variables

Measurement:  anomia67 pwless67 anomia71 pwless71 educ66 occstat66
Latent:       Alien67 Alien71

Exogenous variables

Latent:       SES

Structural equation model                       Number of obs     =        932
Estimation method  = ml
Log likelihood     = -15246.469

 ( 1)  [anomia67]Alien67 = 1
 ( 2)  [anomia71]Alien71 = 1
 ( 3)  [educ66]SES = 1
---------------------------------------------------------------------------------
                |                 OIM
                |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
Structural      |
  Alien67       |
            SES |  -.6140404   .0562407   -10.92   0.000    -.7242701   -.5038107
  --------------+----------------------------------------------------------------
  Alien71       |
        Alien67 |   .7046342   .0533512    13.21   0.000     .6000678    .8092007
            SES |  -.1744153   .0542489    -3.22   0.001    -.2807413   -.0680894
----------------+----------------------------------------------------------------
Measurement     |
  anomia67      |
        Alien67 |          1  (constrained)
          _cons |      13.61   .1126205   120.85   0.000     13.38927    13.83073
  --------------+----------------------------------------------------------------
  pwless67      |
        Alien67 |   .8884887   .0431565    20.59   0.000     .8039034    .9730739
          _cons |      14.67   .1001798   146.44   0.000     14.47365    14.86635
  --------------+----------------------------------------------------------------
  anomia71      |
        Alien71 |          1  (constrained)
          _cons |      14.13   .1158943   121.92   0.000     13.90285    14.35715
  --------------+----------------------------------------------------------------
  pwless71      |
        Alien71 |   .8486022   .0415205    20.44   0.000     .7672235    .9299808
          _cons |       14.9   .1034537   144.03   0.000     14.69723    15.10277
  --------------+----------------------------------------------------------------
  educ66        |
            SES |          1  (constrained)
          _cons |       10.9   .1014894   107.40   0.000     10.70108    11.09892
  --------------+----------------------------------------------------------------
  occstat66     |
            SES |   5.331259   .4307503    12.38   0.000     4.487004    6.175514
          _cons |      37.49   .6947112    53.96   0.000     36.12839    38.85161
----------------+----------------------------------------------------------------
 var(e.anomia67)|   4.009921   .3582978                      3.365724    4.777416
 var(e.pwless67)|   3.187468    .283374                      2.677762    3.794197
 var(e.anomia71)|   3.695593   .3911512                      3.003245     4.54755
 var(e.pwless71)|   3.621531   .3037908                      3.072483    4.268693
   var(e.educ66)|   2.943819   .5002527                      2.109908    4.107319
var(e.occstat66)|     260.63   18.24572                      227.2139    298.9605
  var(e.Alien67)|   5.301416    .483144                      4.434225    6.338201
  var(e.Alien71)|   3.737286   .3881546                      3.048951    4.581019
        var(SES)|    6.65587   .6409484                      5.511067    8.038482
---------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(6)   =     71.62, Prob > chi2 = 0.0000
Run Code Online (Sandbox Code Playgroud)

R-拉万

lavaan 0.6-11 ended normally after 149 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        15
                                                      
  Number of observations                            15
                                                      
Model Test User Model:
                                                      
  Test statistic                                26.531
  Degrees of freedom                                 6
  P-value (Chi-square)                           0.000

Model Test Baseline Model:

  Test statistic                               370.547
  Degrees of freedom                                15
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.942
  Tucker-Lewis Index (TLI)                       0.856

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)                -82.504
  Loglikelihood unrestricted model (H1)        -69.238
                                                      
  Akaike (AIC)                                 195.008
  Bayesian (BIC)                               205.629
  Sample-size adjusted Bayesian (BIC)          159.836

Root Mean Square Error of Approximation:

  RMSEA                                          0.478
  90 Percent confidence interval - lower         0.301
  90 Percent confidence interval - upper         0.670
  P-value RMSEA <= 0.05                          0.000

Standardized Root Mean Square Residual:

  SRMR                                           0.002

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  Alien67 =~                                                            
    anomia67          1.000                               3.388    0.999
    pwless67          1.072    0.016   66.835    0.000    3.632    0.999
  Alien71 =~                                                            
    anomia71          1.000                               3.520    0.999
    pwless71          1.049    0.014   74.224    0.000    3.694    0.999
  SES =~                                                                
    educ66            1.000                               2.794    0.993
    occstat66         3.608    0.267   13.528    0.000   10.081    0.969

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  Alien67 ~                                                             
    SES               1.176    0.087   13.498    0.000    0.970    0.970
  Alien71 ~                                                             
    Alien67           0.996    0.060   16.505    0.000    0.959    0.959
    SES               0.053    0.073    0.727    0.467    0.042    0.042

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .anomia67          0.018    0.011    1.642    0.101    0.018    0.002
   .pwless67          0.024    0.013    1.810    0.070    0.024    0.002
   .anomia71          0.013    0.009    1.359    0.174    0.013    0.001
   .pwless71          0.023    0.012    1.875    0.061    0.023    0.002
   .educ66            0.104    0.153    0.680    0.497    0.104    0.013
   .occstat66         6.514    3.063    2.127    0.033    6.514    0.060
   .Alien67           0.678    0.327    2.074    0.038    0.059    0.059
   .Alien71           0.012    0.012    0.998    0.318    0.001    0.001
    SES               7.808    2.892    2.699    0.007    1.000    1.000
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

编辑:

sem_sm2 <- structure(list(`_group` = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1), format.stata = "%8.0g"), `_type` = structure(c(1, 
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15), format.stata = "%8.0g"), 
    educ66 = structure(c(10.9, 3.1, 1, 0.54, -0.34, -0.41, -0.25, 
    0.53, -0.36, -0.41, -0.16, 0.54, -0.35, -0.37, -0.22), label = "Education, 1966", format.stata = "%10.0g"), 
    occstat66 = structure(c(37.49, 21.22, 0.54, 1, -0.25, -0.3, 
    -0.21, 0.82, -0.3, -0.29, -0.19, 0.81, -0.29, -0.28, -0.16
    ), label = "Occupational status, 1966", format.stata = "%10.0g"), 
    anomia66 = structure(c(13.7, 3.6, -0.34, -0.25, 1, 0.7, 0.27, 
    -0.26, 0.67, 0.55, 0.27, -0.25, 0.53, 0.42, 0.26), label = "Anomia, 1966", format.stata = "%10.0g"), 
    pwless66 = structure(c(14.85, 3.31, -0.41, -0.3, 0.7, 1, 
    0.31, -0.31, 0.55, 0.63, 0.26, -0.25, 0.5, 0.51, 0.26), label = "Powerlessness, 1966", format.stata = "%10.0g"), 
    socdist66 = structure(c(1.11, 1.4, -0.25, -0.21, 0.27, 0.31, 
    1, -0.14, 0.26, 0.28, 0.49, -0.21, 0.27, 0.24, 0.47), label = "Latin American social distance, 1966", format.stata = "%10.0g"), 
    occstat67 = structure(c(36.72, 21, 0.53, 0.82, -0.26, -0.31, 
    -0.14, 1, -0.3, -0.33, -0.11, 0.78, -0.32, -0.26, -0.14), label = "Occupational status, 1967", format.stata = "%10.0g"), 
    anomia67 = structure(c(13.61, 3.44, -0.36, -0.3, 0.67, 0.55, 
    0.26, -0.3, 1, 0.66, 0.25, -0.28, 0.56, 0.44, 0.25), label = "Anomia, 1967", format.stata = "%10.0g"), 
    pwless67 = structure(c(14.67, 3.06, -0.41, -0.29, 0.55, 0.63, 
    0.28, -0.33, 0.66, 1, 0.28, -0.26, 0.47, 0.52, 0.27), label = "Powerlessness, 1967", format.stata = "%10.0g"), 
    socdist67 = structure(c(0.89, 1.27, -0.16, -0.19, 0.27, 0.26, 
    0.49, -0.11, 0.25, 0.28, 1, -0.18, 0.29, 0.23, 0.41), label = "Latin American social distance, 1967", format.stata = "%10.0g"), 
    occstat71 = structure(c(37.47, 20.98, 0.54, 0.81, -0.25, 
    -0.25, -0.21, 0.78, -0.28, -0.26, -0.18, 1, -0.31, -0.28, 
    -0.19), label = "Occupational status, 1971", format.stata = "%10.0g"), 
    anomia71 = structure(c(14.13, 3.54, -0.35, -0.29, 0.53, 0.5, 
    0.27, -0.32, 0.56, 0.47, 0.29, -0.31, 1, 0.67, 0.29), label = "Anomia, 1971", format.stata = "%10.0g"), 
    pwless71 = structure(c(14.9, 3.16, -0.37, -0.28, 0.42, 0.51, 
    0.24, -0.26, 0.44, 0.52, 0.23, -0.28, 0.67, 1, 0.28), label = "Powerlessness, 1971", format.stata = "%10.0g"), 
    socdist71 = structure(c(0.78, 1.25, -0.22, -0.16, 0.26, 0.26, 
    0.47, -0.14, 0.25, 0.27, 0.41, -0.19, 0.29, 0.28, 1), label = "Latin American social distance, 1971", format.stata = "%10.0g")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -15L), label = "Structural model with measurement component", notes = c("Summary statistics data from Wheaton, B., Muthen B., Alwin, D.,           & Summers, G., 1977, \"Assessing reliability and stability in           panel models\", in D. R. Heise (Ed.), _Sociological Methodology           1977_ (pp. 84-136), San Francisco: Jossey-Bass, Inc.", 
"Intended use: Create structural model relating Alienation in 1971, Alienation in 1967, and SES in 1966.", 
"3", "Four indicators each measured in 1966, 1967, and 1981, plus another indicator (educ66) measured only in 1966."
))
Run Code Online (Sandbox Code Playgroud)

小智 3

Assuming you are using haven to read this .dta file then you are using sample data that are not intended for use in analysis, which is clearly stated on this page: https://www.stata-press.com/data/r17/sem.html

Datasets used in the Stata documentation were selected to demonstrate how to use Stata. Some datasets have been altered to explain a particular feature. Do not use these datasets for analysis.

在此输入图像描述

不幸的是,样本数据仅包括例中显示的 932 个观测值中的 15 个

x <- haven::read_dta("https://www.stata-press.com/data/r17/sem_sm2.dta")
sum(sapply(x, length))
#> [1] 225
sum(nrow(x))
#> [1] 15
Run Code Online (Sandbox Code Playgroud)