数据来自这个 RData 数据集
这是脚本:
library(dplyr)
library(ggplot2)
load("brfss2013.RData")
test <- brfss2013 %>%
select(chcscncr,exract11) %>%
filter(chcscncr != "NA" , exract11 != "NA") %>%
group_by(exract11,chcscncr) %>%
summarise(count = n())
Run Code Online (Sandbox Code Playgroud)
下表中的结果:
> head(test)
Source: local data frame [6 x 3]
Groups: exract11 [3]
exract11 chcscncr count
<fctr> <fctr> <int>
1 Active Gaming Devices (Wii Fit, Dance, Dance revolution) Yes 19
2 Active Gaming Devices (Wii Fit, Dance, Dance revolution) No 287
3 Aerobics video or class Yes 800
4 Aerobics video or class No 7340
5 Backpacking Yes 4
6 Backpacking No 38
Run Code Online (Sandbox Code Playgroud)
我想制作一个表格,给出每种运动类型的“是”比例,例如:
从
Type Ans Count
Sport A yes 45
Sport A no 55
Sport B yes 34
Sport B no 66
Run Code Online (Sandbox Code Playgroud)
到:
Type p(yes)
Sport A 0.45
Sport B 0.34
Run Code Online (Sandbox Code Playgroud)
prop.table将总数转换为比例(在这种情况下,仅x/sum(x)用于每个组的值),因此对于您的“From”表:
brfss2013 %>%
select(chcscncr,exract11) %>%
na.omit() %>% # `==` doesn't work for NA
count(exract11, chcscncr) %>% # equivalent to `group_by(...) %>% summarise(n = n())`
group_by(exract11) %>%
mutate(pct = prop.table(n) * 100) # `* 100` to convert to percent
## Source: local data frame [144 x 4]
## Groups: exract11 [75]
##
## exract11 chcscncr n pct
## <fctr> <fctr> <int> <dbl>
## 1 Active Gaming Devices (Wii Fit, Dance, Dance revolution) Yes 19 6.20915
## 2 Active Gaming Devices (Wii Fit, Dance, Dance revolution) No 287 93.79085
## 3 Aerobics video or class Yes 800 9.82801
## 4 Aerobics video or class No 7340 90.17199
## 5 Backpacking Yes 4 9.52381
## 6 Backpacking No 38 90.47619
## 7 Badminton Yes 4 10.52632
## 8 Badminton No 34 89.47368
## 9 Basketball Yes 37 1.64664
## 10 Basketball No 2210 98.35336
## # ... with 134 more rows
Run Code Online (Sandbox Code Playgroud)
对于您的“to”表,filter仅"Yes"行:
brfss2013 %>%
select(chcscncr,exract11) %>%
na.omit() %>%
count(exract11, chcscncr) %>%
group_by(exract11) %>%
mutate(p_yes = prop.table(n)) %>%
filter(chcscncr == "Yes")
## Source: local data frame [69 x 4]
## Groups: exract11 [69]
##
## exract11 chcscncr n p_yes
## <fctr> <fctr> <int> <dbl>
## 1 Active Gaming Devices (Wii Fit, Dance, Dance revolution) Yes 19 0.06209150
## 2 Aerobics video or class Yes 800 0.09828010
## 3 Backpacking Yes 4 0.09523810
## 4 Badminton Yes 4 0.10526316
## 5 Basketball Yes 37 0.01646640
## 6 Bicycling machine exercise Yes 987 0.13708333
## 7 Bicycling Yes 728 0.08519602
## 8 Boating (Canoeing, rowing, kayaking, sailing for pleasure or camping) Yes 22 0.11518325
## 9 Bowling Yes 68 0.09985316
## 10 Boxing Yes 5 0.01633987
## # ... with 59 more rows
Run Code Online (Sandbox Code Playgroud)
从第一个表中可以看出,“是”值的比例非常小。