如果合并,则将文本分组

Tun*_*Lok 3 grouping text r

这是我的数据:

ITEM <- c("A","A","A","B","B","B","B","C","C","D","D","E","E","F","G","G","G")
LOCATION <- c("aaa","bbb","ccc","bbb","fff","ggg","zzz","zzz","eee","hhh","iii","kkk","jjj","iii","iii","yyy","xxx")
df <- as.data.frame(cbind(ITEM,LOCATION))

Long Form:
       ITEM LOCATION
    1     A      aaa
    2     A      bbb
    3     A      ccc
    4     B      bbb
    5     B      fff
    6     B      ggg
    7     B      zzz
    8     C      zzz
    9     C      eee
    10    D      hhh
    11    D      iii
    12    E      kkk
    13    E      jjj
    14    F      iii
    15    G      iii
    16    G      yyy
    17    G      xxx
Run Code Online (Sandbox Code Playgroud)

宽格式(更易于阅读):

ITEM LOCATION.1 LOCATION.2 LOCATION.3 LOCATION.4
A        aaa        bbb        ccc       <NA>
B        bbb        fff        ggg        zzz
C        zzz        eee       <NA>       <NA>
D        hhh        iii       <NA>       <NA>
E        kkk        jjj       <NA>       <NA>
F        iii       <NA>       <NA>       <NA>
G        iii        yyy        xxx       <NA>
Run Code Online (Sandbox Code Playgroud)

最初,我是在位置相交时手动将物料分组的。

即我将分为{A,B,C},{D,F,G},{E}

我的原始数据有8000行,这花了我几天的时间。当数据集很小时,我可以使用左联接并获得所需的输出,但是当数据集很大时,我不能使用它。

是否有任何程序包可以按联合将元素分组?

d.b*_*d.b 5

#Convert columns to character to avoid complications later
df$ITEM = as.character(df$ITEM)
df$LOCATION = as.character(df$LOCATION)

#Split ITEM by LOCATION and convert each sub-group into data.frame
#by making the first element of each sub-group 'from' and all elements 'to'
df1 = do.call(rbind,
              lapply(split(df$ITEM, df$LOCATION), function(x)
                  data.frame(from = x[1], to = x, stringsAsFactors = FALSE)))

library(igraph)
#Convert the data.frame df1 into graph
g = graph.data.frame(df1)
#Use 'clusters' to identify the separate groups
#and 'groups' to extract the vertices (in this case, ITEM)
groups(clusters(g))
#$`1`
#[1] "A" "C" "B"

#$`2`
#[1] "D" "G" "F"

#$`3`
#[1] "E"
Run Code Online (Sandbox Code Playgroud)

您也可以LOCATION在最后删除(根据对问题的评论

lapply(groups(clusters(graph.data.frame(df))), function(x) x[x %in% df$ITEM])
#$`1`
#[1] "A" "B" "C"

#$`2`
#[1] "D" "F" "G"

#$`3`
#[1] "E"
Run Code Online (Sandbox Code Playgroud)