我正在使用Thomas Lumley 的调查包来创建交叉表和 SE。我正在努力指定交叉表的分母。
library(survey)
data <- read_table2("Q50_1 Q50_2 Q38 Q90 pov gender wgt id
yes 3 Yes NA High M 1.3 A
NA 4 No 2 Med F 0.4 B
no 2 NA 4 Low F 1.2 C
maybe 3 No 2 High M 0.5 D
yes NA No NA High M 0.7 E
no 2 Yes 3 Low F 0.56 F
maybe 4 Yes 2 Med F 0.9 G")
Run Code Online (Sandbox Code Playgroud)
design <- svydesign(id =~id,
weights = ~wgt,
nest = FALSE,
data = data)
Run Code Online (Sandbox Code Playgroud)
svymean(~interaction(Q50_1,gender=="F"), design, na.rm = T)
Run Code Online (Sandbox Code Playgroud)
这给了我:
mean SE
interaction(Q50_1, gender == "F")maybe.FALSE 0.096899 0.1043
interaction(Q50_1, gender == "F")no.FALSE 0.000000 0.0000
interaction(Q50_1, gender == "F")yes.FALSE 0.387597 0.2331
interaction(Q50_1, gender == "F")maybe.TRUE 0.174419 0.1725
interaction(Q50_1, gender == "F")no.TRUE 0.341085 0.2233
interaction(Q50_1, gender == "F")yes.TRUE 0.000000 0.0000
Run Code Online (Sandbox Code Playgroud)
这对我来说没那么有用,因为分母包含每个组合的 TRUE FALSE 值,而我只对真实的平均值感兴趣。所以,我可以很容易地找到 TRUE 的百分比,如下所示:
dat <- as.data.frame(svymean(~interaction(Q50_1,gender=="F"), design, na.rm = T)) %>% tibble::rownames_to_column("question")
dat %>% tidyr::separate(question,c("question",'response'), sep = "\\)", extra = "merge") %>%
mutate(question = str_replace(question,"interaction\\("," ")) %>%
tidyr::separate(response,c('value', 'bool'), sep ="\\." ) %>%
tidyr::separate(question,c('question', 'group'), sep ="\\," ) %>%
tidyr::separate(group,c('group_level', 'group'), sep ="\\==" ) %>%
filter(bool=='TRUE') %>%
group_by(question, group_level, group) %>%
mutate(sum_true = sum(mean)) %>%
mutate(mean= mean/sum_true)
Run Code Online (Sandbox Code Playgroud)
这给了我:
question group_level group value bool mean SE sum_true
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
" Q50_1" " gender " " \"F\"" maybe TRUE 0.338 0.173 0.516
" Q50_1" " gender " " \"F\"" no TRUE 0.662 0.223 0.516
" Q50_1" " gender " " \"F\"" yes TRUE 0 0 0.516
Run Code Online (Sandbox Code Playgroud)
这些均值正是我想要的,但 SE 与不同的分母相关联,并且不代表操纵后的均值。有没有办法调用 svymean 来仅显示分母中 TRUE 值的平均值和 SE?
我认为这样的事情可能会做(但它不起作用):
svymean(~interaction(Q50_1,gender=="F"[TRUE]), design, na.rm = T)
Run Code Online (Sandbox Code Playgroud)
mean SE
interaction(Q50_1, gender == "F"[TRUE])maybe.TRUE 0.338 0.0725
interaction(Q50_1, gender == "F"[TRUE])no.TRUE 0.0.662 0.0233
interaction(Q50_1, gender == "F"[TRUE])yes.TRUE 0.0 0.0000
Run Code Online (Sandbox Code Playgroud)
获取对您想要的每个答案做出回应的女性百分比
svymean(~Q50_1, subset(design, gender== "F"),na.rm=TRUE)
Run Code Online (Sandbox Code Playgroud)
或同等的(因为这就是svyby它的原理)
svyby(~Q50_1, ~gender, design, svymean, na.rm = TRUE)
Run Code Online (Sandbox Code Playgroud)
如果您还想获得空类别,则需要将~Q50_1变量转换为因子 - 这就是因子(相对于字符串)的要点:它们知道它们具有什么级别。
如果您能够以编程方式提取部分输出,请使用coef和SE函数
data$Q50_1<-factor(data$Q50_1)
design <- svydesign(id =~id,
weights = ~wgt,
nest = FALSE,
data = data)
svymean(~Q50_1, subset(design, gender== "F"),na.rm=TRUE)
svyby(~Q50_1, ~gender, design, svymean, na.rm = TRUE)[1,]
coef(svyby(~Q50_1, ~gender, design, svymean, na.rm = TRUE))
SE(svyby(~Q50_1, ~gender, design, svymean, na.rm = TRUE))
Run Code Online (Sandbox Code Playgroud)
这些与您所使用的不符~interaction,因为您以这种方式获得的与您所说的想要的不符。该interaction分析给出了同时做出回应的女性人数的百分比yes,而不是女性中做出回应的百分比yes。换句话说,通过interaction分析得到的 6 个百分比相加就是 100%,而不是 200%。
> sum(coef(svymean(~interaction(Q50_1,gender=="F"), design, na.rm = T)))
[1] 1
Run Code Online (Sandbox Code Playgroud)