拆分名称并在R中创建矩阵

Luc*_*lho 5 r

我有这些数据:

names <- c("Baker, Chet", "Jarret, Keith", "Miles Davis")
Run Code Online (Sandbox Code Playgroud)

我想操纵它,所以名字首先出现,所以我把它分开:

names <- strsplit(names, ", ")

[[1]]
[1] "Baker" "Chet"

[[2]]
[1] "Jarret" "Keith"

[[3]]
[1] "Miles Davis"
Run Code Online (Sandbox Code Playgroud)

问题是,当我想把它们放在一起时,这个名字"Miles Davis"会出错,因为它已经是了full name.

matrix(unlist(names), ncol=2, byrow = TRUE)

     [,1]          [,2]    
[1,] "Baker"       "Chet" 
[2,] "Jarret"      "Keith"
[3,] "Miles Davis" "Baker"
Run Code Online (Sandbox Code Playgroud)

我该怎么做才能创建一个df看起来像这样的新东西:

"Chet Baker"
"Keith Jarret"
"Miles Davis"
Run Code Online (Sandbox Code Playgroud)

以下是参考资料:http://rfunction.com/archives/1499

tal*_*lat 7

您可以轻松地调整正则表达式中使用的模式,使其匹配逗号后跟0+空格或1 +空格:

names <- strsplit(names, ",\\s*|\\s+")
matrix(unlist(names), ncol=2, byrow = TRUE)
#     [,1]     [,2]   
#[1,] "Baker"  "Chet" 
#[2,] "Jarret" "Keith"
#[3,] "Miles"  "Davis"
Run Code Online (Sandbox Code Playgroud)

由于期望的结果与最初描述的不同,继承人采用不同的方法:

names <- strsplit(names, ",\\s*")
data.frame(name = sapply(names, function(x) paste(rev(x), collapse = " ")))
#          name
#1   Chet Baker
#2 Keith Jarret
#3  Miles Davis
Run Code Online (Sandbox Code Playgroud)

另一个选项是,使用正则表达式中的捕获组将逗号之前的所有内容与逗号后面的所有内容进行交换,并用空格替换逗号.

names <- c("Baker, Chet", "Jarret, Keith", "Miles Davis")
sub("([^,]+),\\s*([^,]+)$", "\\2 \\1", names)
#[1] "Chet Baker"   "Keith Jarret" "Miles Davis" 
Run Code Online (Sandbox Code Playgroud)