R，dplyr：收集列的唯一值，根据集合交集改变标签

Question

R，dplyr：收集列的唯一值，根据集合交集改变标签

我正在处理一个大型数据集，但让我们举一个简单的例子来演示我想要实现的目标。我正在使用R和dplyr。我有一张桌子：

id  attribute correct
1   a         a
1   b         a
1   c         a
2   d         e
2   e         e
3   d         f

Run Code Online (Sandbox Code Playgroud)

从上面，我想创建两列，attribute_set和label。为了澄清，我想要：

id  attribute_set   correct   label
1   a, b, c         a         1
2   d, e            e         1
3   d               f         0

Run Code Online (Sandbox Code Playgroud)

attribute_set应该是一个具有所有属性的集合（任何数据结构）id。label如果正确的值应为 1，attribute_set否则应为 0。

目前，我attribute_set像这样创建：

design_mat1 <- design_mat %>%
  group_by(id) %>%
  mutate(attribute_set = paste(unique(attribute), collapse = "|")) %>%
  select(-attribute)

Run Code Online (Sandbox Code Playgroud)

我label像这样生成：

design_mat2b <- design_mat2 %>%
  group_by(id) %>%
  mutate(label = ifelse(correct %in% attribute_set, 1, 0))

Run Code Online (Sandbox Code Playgroud)

然而，我的标签仅在attribute_set. 我想我必须strsplit使用|或attribute_set使用其他一些数据结构。我一直无法弄清楚要使用什么替代数据结构，也无法找到strsplit可行的|解决方案。任何提示/解决方案表示赞赏。

Answer 1

akr*_*run 5

按“id”分组后，我们可以使用summarise“ attribute”的paste元素unique，同时如果“attribute”中有“ Correct”元素，则选择“ Correct”和“ label”的first或值uniqueany

library(dplyr)
design_mat %>%
   group_by(id) %>% 
   summarise(attribute_set = toString(unique(attribute)), 
             correct = first(correct),
             label = +(any(correct %in% attribute)))
# A tibble: 3 x 4
#     id attribute_set correct label
#  <int> <chr>         <chr>   <int>
#1     1 a, b, c       a           1
#2     2 d, e          e           1
#3     3 d             f           0

Run Code Online (Sandbox Code Playgroud)

或者在“attribute_set”和“label”中group_by也使用“正确”summarise

归档时间：	7 年，11 月前
查看次数：	1010 次
最近记录：	7 年，11 月前