R strsplit有多个无序拆分参数?

Eti*_*rie 46 split r

给定一个字符串

test_1<-"abc def,ghi klm"
test_2<-"abc, def ghi klm"
Run Code Online (Sandbox Code Playgroud)

我想获得

"abc"
"def"
"ghi"
Run Code Online (Sandbox Code Playgroud)

但是,使用strsplit时,必须知道字符串中拆分值的顺序,因为strsplit使用第一个值进行第一次拆分,第二次执行第二次拆卸...然后循环使用.

但这不是:

strsplit(test_1, c(",", " "))
strsplit(test_2, c(" ", ","))

strsplit(test_2, split=c("[:punct:]","[:space:]"))[[1]]
Run Code Online (Sandbox Code Playgroud)

我想在一个步骤中找到任何分裂值的地方拆分字符串.

42-*_*42- 59

实际上也strsplit使用grep模式.(A逗号是一个正则表达式元字符而空间没有;因此需要双逸出逗号在图案参数所以使用."\\s"将更加提高可读性比必要性):

> strsplit(test_1, "\\, |\\,| ")
[[1]]
[1] "abc" "def" "ghi" "klm"

> strsplit(test_2, "\\, |\\,| ")
[[1]]
[1] "abc" "def" "ghi" "klm"
Run Code Online (Sandbox Code Playgroud)

如果不同时使用\\,\\,(注意SO没有显示的额外空间),您将获得一些字符(0)值.如果我写的话可能会更清楚:

> strsplit(test_2, "\\,\\s|\\,|\\s")
[[1]]
[1] "abc" "def" "ghi" "klm"
Run Code Online (Sandbox Code Playgroud)

@Fojtasek是如此正确:使用字符类通常会简化任务,因为它会创建一个隐式逻辑OR:

> strsplit(test_2, "[, ]+")
[[1]]
[1] "abc" "def" "ghi" "klm"

> strsplit(test_1, "[, ]+")
[[1]]
[1] "abc" "def" "ghi" "klm"
Run Code Online (Sandbox Code Playgroud)

  • OP的更新请求也是`strsplit(test_2,"[[:punct:] [:space:]] +")`. (5认同)
  • 怎么样strsplit(test_2,"[,] +") (4认同)

jth*_*zel 6

如果您不喜欢正则表达式,可以strsplit()多次调用:

strsplits <- function(x, splits, ...)
{
    for (split in splits)
    {
        x <- unlist(strsplit(x, split, ...))
    }
    return(x[!x == ""]) # Remove empty values
}

strsplits(test_1, c(" ", ","))
# "abc" "def" "ghi" "klm"
strsplits(test_2, c(" ", ","))
# "abc" "def" "ghi" "klm"
Run Code Online (Sandbox Code Playgroud)

更新了添加的示例

strsplits(test_1, c("[[:punct:]]","[[:space:]]"))
# "abc" "def" "ghi" "klm"
strsplits(test_2, c("[[:punct:]]","[[:space:]]"))
# "abc" "def" "ghi" "klm"
Run Code Online (Sandbox Code Playgroud)

但是如果你打算使用正则表达式,你可以选择@DWin的方法:

strsplit(test_1, "[[:punct:][:space:]]+")[[1]]
# "abc" "def" "ghi" "klm"
strsplit(test_2, "[[:punct:][:space:]]+")[[1]]
# "abc" "def" "ghi" "klm"
Run Code Online (Sandbox Code Playgroud)


dan*_*kas 5

你可以去strsplit(test_1, "\\W").