hpy*_*hpy 0 r summarization dplyr
我正在尝试dplyr用不同条形图中人们饮酒记录的假设数据集(链接到pastebin)来练习R 包:
bar_name,person,drink_ordered,times_ordered,liked_it
Moe’s Tavern,Homer,Romulan ale,2,TRUE
Moe’s Tavern,Homer,Scotch whiskey,1,FALSE
Moe’s Tavern,Guinan,Romulan ale,1,TRUE
Moe’s Tavern,Guinan,Scotch whiskey,3,FALSE
Moe’s Tavern,Rebecca,Romulan ale,2,FALSE
Moe’s Tavern,Rebecca,Scotch whiskey,4,TRUE
Cheers,Rebecca,Budweiser,1,TRUE
Cheers,Rebecca,Black Hole,1,TRUE
Cheers,Bender,Budweiser,1,FALSE
Cheers,Bender,Black Hole,1,TRUE
Cheers,Krusty,Budweiser,1,TRUE
Cheers,Krusty,Black Hole,1,FALSE
The Hip Joint,Homer,Scotch whiskey,3,FALSE
The Hip Joint,Homer,Corona,1,TRUE
The Hip Joint,Homer,Budweiser,1,FALSE
The Hip Joint,Krusty,Romulan ale,3,TRUE
The Hip Joint,Krusty,Black Hole,4,FALSE
The Hip Joint,Krusty,Corona,1,TRUE
The Hip Joint,Rebecca,Corona,2,TRUE
The Hip Joint,Rebecca,Romulan ale,4,FALSE
The Hip Joint,Bender,Corona,1,TRUE
Ten Forward,Bender,Romulan ale,1,
Ten Forward,Bender,Black Hole,,FALSE
Ten Forward,Guinan,Romulan ale,2,TRUE
Ten Forward,Guinan,Budweiser,,FALSE
Ten Forward,Krusty,Budweiser,1,
Ten Forward,Krusty,Black Hole,1,FALSE
Mos Eisley,Krusty,Black Hole,1,TRUE
Mos Eisley,Krusty,Corona,2,FALSE
Mos Eisley,Krusty,Romulan ale,1,TRUE
Mos Eisley,Homer,Black Hole,1,TRUE
Mos Eisley,Homer,Corona,2,FALSE
Mos Eisley,Homer,Romulan ale,1,TRUE
Mos Eisley,Bender,Black Hole,1,TRUE
Mos Eisley,Bender,Corona,2,FALSE
Mos Eisley,Bender,Romulan ale,1,TRUE
Run Code Online (Sandbox Code Playgroud)
我已经使用了dplyr group_by()和summarise()函数几次,但我不知道如何处理更多的嵌套情况.具体来说,我想问一些类似的问题:
对于每个独特的bar_name,每个person订单都完全相同的饮料组合(drink_ordered)?在这个数据集中,这将标记TRUE为Moe's Tavern,Cheers和Mos Eisley.
即使每个人都person订购了特定饮料的完全相同的组合bar_name,他们是否订购了相同数量的饮料(times_ordered)?例如,Moe's Tavern和Mos Eisley我会标记TRUE这个问题.
然后,即使每个人person在特定的酒吧中订购完全相同的饮料组合次数相同,他们对liked_it饮料的看法()是否完全相同?在这个将TRUE用于Mos Eisley的数据集中.
观察在数据集中有案例(髋关节),其中答案将FALSE针对所有三个问题,并且存在缺失值(十个前进).
理想情况下,我希望生成一个第一列的表bar_name,以及另外三个布尔列TRUE或FALSE三个问题中的每一个.
如何dplyr在R中有效地实现这一目标?非常感谢你.
你可以做:
DF %>%
arrange(drink_ordered, times_ordered, liked_it) %>% group_by(bar_name, person) %>%
summarise(
Ld = toString(drink_ordered),
Ldt = paste(Ld, toString(times_ordered), sep="_"),
Ldtl = paste(Ldt, toString(liked_it), sep="_")
) %>%
group_by(bar_name) %>%
summarise_each(funs(n_distinct)) %>%
mutate_each(funs(. == 1), -person, -bar_name)
# bar_name person Ld Ldt Ldtl
# (chr) (int) (lgl) (lgl) (lgl)
# 1 Cheers 3 TRUE TRUE FALSE
# 2 Moe’s Tavern 3 TRUE FALSE FALSE
# 3 Mos Eisley 3 TRUE TRUE TRUE
# 4 Ten Forward 3 FALSE FALSE FALSE
# 5 The Hip Joint 4 FALSE FALSE FALSE
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
215 次 |
| 最近记录: |