R分号将列分隔为行

New*_*ing 4 split r delimiter

我正在使用RStudio 2.15.0并使用XLConnect从3000多行和12列创建了一个Excel对象我试图将列分隔/拆分成行,但不知道这是否可行或如何操作.使用3列连接的下面数据示例.任何有关这方面的帮助都会很棒.

适用于其中两列的代码如下.

v1 <- with(df, tapply(PolId, Description,  FUN= function(x) {
x1 <- paste(x, collapse=";")
gsub('(\\b\\S+\\b)(?=.*\\b\\1\\b.*);', '',     x1, perl=TRUE)}))
library(stringr)
Description <- rep(names(v1),  str_count(v1, '\\w+'))
PolId <- scan(text=gsub(';+', ' ', v1), what='', quiet=TRUE)
data.frame(PolId, Description)  
Run Code Online (Sandbox Code Playgroud)

样本数据

PolId   Description  Document.Type
ABC123;ABC456;ABC789;   TEST1  Pol1
ABC123;ABC456;ABC789;   TEST1  Pol1
ABC123;ABC456;ABC789;   TEST1  Pol1
AAA123; TEST1  End1
AAA123; TEST2  End2
ABB123;ABC123;  TEST3  End1
ABB123;ABC123;  TEST3  End1
Run Code Online (Sandbox Code Playgroud)

我希望输出像这样(替换重复的Polid)

PolId   Description  Document.Type
ABC123  TEST1        Pol1
ABC456  TEST1        Pol1
ABC789  TEST1        Pol1
AAA123  TEST1        End1
AAA123  TEST2        End2
ABB123  TEST3        End1
ABC123  TEST3        End1
Run Code Online (Sandbox Code Playgroud)

G. *_*eck 7

这是一个基础R解决方案.PolId使用strplit和拆分字段,并为每个这样的拆分字段cbind与相应的Description.这给出了我们rbind在一起的矩阵列表.最后设置列名称.

out <- do.call(rbind, Map(cbind, strsplit(DF$PolId, ";"), DF$Description))
colnames(out) <- colnames(DF)
Run Code Online (Sandbox Code Playgroud)

赠送:

> out
      PolId    Description
 [1,] "ABC123" "TEST1"    
 [2,] "ABC456" "TEST1"    
 [3,] "ABC789" "TEST1"    
 [4,] "ABC123" "TEST1"    
 [5,] "ABC456" "TEST1"    
 [6,] "ABC789" "TEST1"    
 [7,] "ABC123" "TEST1"    
 [8,] "ABC456" "TEST1"    
 [9,] "ABC789" "TEST1"    
[10,] "AAA123" "TEST1"    
[11,] "AAA123" "TEST2"    
[12,] "ABB123" "TEST3"    
[13,] "ABC123" "TEST3"    
[14,] "ABB123" "TEST3"    
[15,] "ABC123" "TEST3" 
Run Code Online (Sandbox Code Playgroud)

注意:我们使用它作为输入:

DF <-
structure(list(PolId = c("ABC123;ABC456;ABC789;", "ABC123;ABC456;ABC789;", 
"ABC123;ABC456;ABC789;", "AAA123;", "AAA123;", "ABB123;ABC123;", 
"ABB123;ABC123;"), Description = c("TEST1", "TEST1", "TEST1", 
"TEST1", "TEST2", "TEST3", "TEST3")), .Names = c("PolId", "Description"
), class = "data.frame", row.names = c(NA, -7L))
Run Code Online (Sandbox Code Playgroud)


Dav*_*urg 5

这是一个快速data.table可行的解决方案

library(data.table)
unique(setDT(df)[, .(PolId = unlist(strsplit(as.character(PolId), ";"))), by = Description])
#    Description  PolId
# 1:       TEST1 ABC123
# 2:       TEST1 ABC456
# 3:       TEST1 ABC789
# 4:       TEST1 AAA123
# 5:       TEST2 AAA123
# 6:       TEST3 ABB123
# 7:       TEST3 ABC123
Run Code Online (Sandbox Code Playgroud)

根据您的编辑 - 另一个选项(如果您有两列以上)

library(splitstackshape)
unique(cSplit(df, "PolId", ";", "long"))
#     PolId Description Document.Type
# 1: ABC123       TEST1          Pol1
# 2: ABC456       TEST1          Pol1
# 3: ABC789       TEST1          Pol1
# 4: AAA123       TEST1          End1
# 5: AAA123       TEST2          End2
# 6: ABB123       TEST3          End1
# 7: ABC123       TEST3          End1
Run Code Online (Sandbox Code Playgroud)