这是我的数据:
ITEM <- c("A","A","A","B","B","B","B","C","C","D","D","E","E","F","G","G","G")
LOCATION <- c("aaa","bbb","ccc","bbb","fff","ggg","zzz","zzz","eee","hhh","iii","kkk","jjj","iii","iii","yyy","xxx")
df <- as.data.frame(cbind(ITEM,LOCATION))
Long Form:
ITEM LOCATION
1 A aaa
2 A bbb
3 A ccc
4 B bbb
5 B fff
6 B ggg
7 B zzz
8 C zzz
9 C eee
10 D hhh
11 D iii
12 E kkk
13 E jjj
14 F iii
15 G iii
16 G yyy
17 G xxx
Run Code Online (Sandbox Code Playgroud)
宽格式(更易于阅读):
ITEM LOCATION.1 LOCATION.2 LOCATION.3 LOCATION.4
A aaa bbb ccc <NA>
B bbb fff ggg zzz
C zzz eee <NA> <NA>
D hhh iii <NA> <NA>
E kkk jjj <NA> <NA>
F iii <NA> <NA> <NA>
G iii yyy xxx <NA>
Run Code Online (Sandbox Code Playgroud)
最初,我是在位置相交时手动将物料分组的。
即我将分为{A,B,C},{D,F,G},{E}
我的原始数据有8000行,这花了我几天的时间。当数据集很小时,我可以使用左联接并获得所需的输出,但是当数据集很大时,我不能使用它。
是否有任何程序包可以按联合将元素分组?
#Convert columns to character to avoid complications later
df$ITEM = as.character(df$ITEM)
df$LOCATION = as.character(df$LOCATION)
#Split ITEM by LOCATION and convert each sub-group into data.frame
#by making the first element of each sub-group 'from' and all elements 'to'
df1 = do.call(rbind,
lapply(split(df$ITEM, df$LOCATION), function(x)
data.frame(from = x[1], to = x, stringsAsFactors = FALSE)))
library(igraph)
#Convert the data.frame df1 into graph
g = graph.data.frame(df1)
#Use 'clusters' to identify the separate groups
#and 'groups' to extract the vertices (in this case, ITEM)
groups(clusters(g))
#$`1`
#[1] "A" "C" "B"
#$`2`
#[1] "D" "G" "F"
#$`3`
#[1] "E"
Run Code Online (Sandbox Code Playgroud)
您也可以LOCATION在最后删除(根据对问题的评论)
lapply(groups(clusters(graph.data.frame(df))), function(x) x[x %in% df$ITEM])
#$`1`
#[1] "A" "B" "C"
#$`2`
#[1] "D" "F" "G"
#$`3`
#[1] "E"
Run Code Online (Sandbox Code Playgroud)