cbo*_*tig 4 r dplyr tidyr tidyverse
考虑最小的例子:
library(tidyverse)
ex <-tribble(
~id, ~property, ~value,
1, "A", 9,
1, "A", 8,
1, "B", 7,
2, "A", 6,
2, "B", 5
)
Run Code Online (Sandbox Code Playgroud)
我的目标是将属性扩展到列中以获取此表:
tribble(
~id, ~A, ~B,
1, 9, 7,
1, 8, 7,
2, 6, 5
)
Run Code Online (Sandbox Code Playgroud)
按id和分组property并添加密钥会关闭,但会留下NA:
## almost but not quite
ex %>%
group_by(id, property) %>%
mutate(key = row_number()) %>%
spread(property, value) %>%
select(-key) -> X
X
Run Code Online (Sandbox Code Playgroud)
得到:
id A B
1 1 9 7
2 1 8 NA
3 2 6 5
Run Code Online (Sandbox Code Playgroud)
我可以在最小的例子中解决这个问题,方法是将每个分解出来property,删除NAs,并通过id以下方式加入:
inner_join(
na.omit(select(X, id, A)),
na.omit(select(X, id, B))
)
Run Code Online (Sandbox Code Playgroud)
但很明显,这并没有推广到任意一组属性.什么是更好的tidyverse策略来做到这一点?
注意:之前的几个问题都与前半部分有关,例如构建key列以便spread不会失败,但看不到某些内容NA.
您可以使用fill从tidyr:
library(dplyr)
library(tidyr)
ex %>%
group_by(id, property) %>%
mutate(key = row_number()) %>%
spread(property, value) %>%
select(-key) %>%
group_by(id) %>%
fill(-id)
Run Code Online (Sandbox Code Playgroud)
结果:
# A tibble: 3 x 3
# Groups: id [2]
id A B
<dbl> <dbl> <dbl>
1 1 9 7
2 1 8 7
3 2 6 5
Run Code Online (Sandbox Code Playgroud)