Ste*_*rdi 6 r hierarchical-data multi-level stata
在处理分层/多级/面板数据集时,采用一个返回可用变量的组内和组间标准差的包可能非常有用。
Stata通过以下命令可以轻松完成以下数据操作
xtsum, i(momid)
Run Code Online (Sandbox Code Playgroud)
我进行了研究,但找不到任何R可以做到这一点的软件包。
编辑:
为了解决问题,分层数据集的示例可能是这样的:
son_id mom_id hispanic mom_smoke son_birthweigth
1 1 1 1 3950
2 1 1 0 3890
3 1 1 0 3990
1 2 0 1 4200
2 2 0 1 4120
1 3 0 0 2975
2 3 0 1 2980
Run Code Online (Sandbox Code Playgroud)
每个母亲(较高级别)有两个或更多儿子(较低级别)的事实给出了“多级”结构。因此,每个母亲都定义了一组观察结果。
因此,每个数据集变量可以在母亲之间和母亲之间或仅在母亲之间变化。birtweigth母亲之间会有所不同,但同一位母亲之间也会有所不同。而是hispanic固定为同一个母亲。
例如,母体内部方差son_birthweigth为:
# mom1 means
bwt_mean1 <- (3950+3890+3990)/3
bwt_mean2 <- (4200+4120)/2
bwt_mean3 <- (2975+2980)/2
# Within-mother variance for birthweigth
((3950-bwt_mean1)^2 + (3890-bwt_mean1)^2 + (3990-bwt_mean1)^2 +
(4200-bwt_mean2)^2 + (4120-bwt_mean2)^2 +
(2975-bwt_mean3)^2 + (2980-bwt_mean3)^2)/(7-1)
Run Code Online (Sandbox Code Playgroud)
而母亲之间的差异是:
# overall mean of birthweigth:
# mean <- sum(data$son_birthweigth)/length(data$son_birthweigth)
mean <- (3950+3890+3990+4200+4120+2975+2980)/7
# within variance:
((bwt_mean1-mean)^2 + (bwt_mean2-mean)^2 + (bwt_mean3-mean)^2)/(3-1)
Run Code Online (Sandbox Code Playgroud)
我不知道你的 Stata 命令应该重现什么,但要回答有关层次结构问题的第二部分:使用list. 例如,您定义如下结构:
tree = list(
"var1" = list(
"panel" = list(type ='p',mean = 1,sd=0)
,"cluster" = list(type = 'c',value = c(5,8,10)))
,"var2" = list(
"panel" = list(type ='p',mean = 2,sd=0.5)
,"cluster" = list(type="c",value =c(1,2)))
)
Run Code Online (Sandbox Code Playgroud)
要创建它,lapply可以很方便地使用list
tree <- lapply(list('var1','var2'),function(x){
ll <- list(panel= list(type ='p',mean = rnorm(1),sd=0), ## I use symbol here not name
cluster= list(type = 'c',value = rnorm(3))) ## R prefer symbols
})
names(tree) <-c('var1','var2')
Run Code Online (Sandbox Code Playgroud)
您可以使用以下命令查看结构str
str(tree)
List of 2
$ var1:List of 2
..$ panel :List of 3
.. ..$ type: chr "p"
.. ..$ mean: num 0.284
.. ..$ sd : num 0
..$ cluster:List of 2
.. ..$ type : chr "c"
.. ..$ value: num [1:3] 0.0722 -0.9413 0.6649
$ var2:List of 2
..$ panel :List of 3
.. ..$ type: chr "p"
.. ..$ mean: num -0.144
.. ..$ sd : num 0
..$ cluster:List of 2
.. ..$ type : chr "c"
.. ..$ value: num [1:3] -0.595 -1.795 -0.439
Run Code Online (Sandbox Code Playgroud)
我想那个包reshape2就是你想要的。我在这里演示一下。
为了进行多级分析,我们需要重塑数据。
首先,将变量分为两组:标识符变量和测量变量。
library(reshape2)
dat.m <- melt(dat,id.vars=c('son_id','mom_id')) ## other columns are measured
str(dat.m)
'data.frame': 21 obs. of 4 variables:
$ son_id : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 1 2 1 2 3 ...
$ mom_id : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 3 3 1 1 1 ...
$ variable: Factor w/ 3 levels "hispanic","mom_smoke",..: 1 1 1 1 1 1 1 2 2 2 ...
$ value : num 1 1 1 0 0 0 0 1 0 0 ..
Run Code Online (Sandbox Code Playgroud)
一旦你有了“moten”形式的数据,你就可以“cast”将其重新排列成你想要的形状:
# mom1 means for all variable
acast(dat.m,variable~mom_id,mean)
1 2 3
hispanic 1.0000000 0 0.0
mom_smoke 0.3333333 1 0.5
son_birthweigth 3943.3333333 4160 2977.5
# Within-mother variance for birthweigth
acast(dat.m,variable~mom_id,function(x) sum((x-mean(x))^2))
1 2 3
hispanic 0.0000000 0 0.0
mom_smoke 0.6666667 0 0.5
son_birthweigth 5066.6666667 3200 12.5
## overall mean of each variable
acast(dat.m,variable~.,mean)
[,1]
hispanic 0.4285714
mom_smoke 0.5714286
son_birthweigth 3729.2857143
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2220 次 |
| 最近记录: |