如何按字符拆分字符串而不忽略尾随拆分字符?

Sel*_*elk 19 regex string r

我有一个类似于以下内容的字符串

my_string <- "apple,banana,orange,"
Run Code Online (Sandbox Code Playgroud)

我想分割,以产生输出:

list(c('apple', 'banana', 'orange', ""))
Run Code Online (Sandbox Code Playgroud)

我认为 strsplit 可以实现这一点,但它对待尾随的 ',' 就像它不存在一样

my_string <- "apple,banana,orange,"
Run Code Online (Sandbox Code Playgroud)

由reprex 包(v2.0.1)于 2023 年 11 月 15 日创建

实现所需输出的最简单方法是什么?

更多带有示例字符串和所需输出的测试用例

string1 = "apple,banana,orange,"
output1 = list(c('apple', 'banana', 'orange', ''))

string2 =  "apple,banana,orange,pear"
output2 = list(c('apple', 'banana', 'orange', 'pear'))

string3 =  ",apple,banana,orange"
output3 = list(c('', 'apple', 'banana', 'orange'))

## Examples of non-comma separated strings
# '|' separator
string4 =  "|apple|banana|orange|"
output4 = list(c('', 'apple', 'banana', 'orange', ''))

# 'x' separator
string5 =  "xapplexbananaxorangex"
output5 = list(c('', 'apple', 'banana', 'orange', ''))

Run Code Online (Sandbox Code Playgroud)

编辑:

理想的解决方案应该推广到任何分裂字符

还更喜欢 base-R 解决方案(尽管仍然链接提供此功能的任何包,因为它们的源代码可能有助于查看!)

Tho*_*ing 14

为什么strsplit没有给出期望的输出?

当您输入时?strsplit,您将看到以下语句

请注意,这意味着如果在(非空)字符串的开头有匹配,则输出的第一个元素是“”,但如果在字符串的末尾有匹配,则输出是相同的与删除匹配项一样。

""这就是您在使用 时看不到尾随的原因strsplit

下面是一些演示

> strsplit("apple,banana,orange,", ",")
[[1]]
[1] "apple"  "banana" "orange"


> strsplit(",apple,banana,orange,", ",")
[[1]]
[1] ""       "apple"  "banana" "orange"


> strsplit(",apple,banana,orange", ",")
[[1]]
[1] ""       "apple"  "banana" "orange"


> strsplit("apple,banana,orange", ",")
[[1]]
[1] "apple"  "banana" "orange"
Run Code Online (Sandbox Code Playgroud)

Base R 解决方法

如果您想进行编码练习,一个基本 R 选项可以定义一个自定义函数(递归),如下所示

f <- function(x, sep = ",") {
  pat <- sprintf("^(.*?)%s.*", sep)
  s1 <- sub(pat, "\\1", x)
  s2 <- sub(paste0("^.*?", sep), "", x)
  if (s2 == x) {
    return(x)
  }
  c(s1, Recall(s2, sep))
}
Run Code Online (Sandbox Code Playgroud)

substr或带有+的变体regexpr

f <- function(x, sep = ",") {
  idx <- regexpr(sep, x)
  s1 <- substr(x, 1, idx - 1)
  s2 <- substr(x, idx + 1, nchar(x))
  if (s2 == x) {
    return(x)
  }
  c(s1, Recall(s2, sep))
}
Run Code Online (Sandbox Code Playgroud)

这样

> f("apple,banana,orange,")
[1] "apple"  "banana" "orange" ""

> f(",apple,banana,orange,")
[1] ""       "apple"  "banana" "orange" ""      

> f(",apple,banana,orange")
[1] ""       "apple"  "banana" "orange"

> f("apple,banana,orange")
[1] "apple"  "banana" "orange"
Run Code Online (Sandbox Code Playgroud)


Gue*_*sBF 12

使用纵梁

library(stringr)

str_split(my_string, ",")

[[1]]
[1] "apple"  "banana" "orange" ""  
Run Code Online (Sandbox Code Playgroud)

  • 我认为这个答案可以简化为仅使用 `stringr::str_split()` 因为它处理前导和尾随字符串, `stringr::str_split(",apple,banana,orange,", pattern = ",")` (3认同)
  • 这是一个很好的解决方案,可能对未来的观众有用。没有标记为答案的唯一原因是由于对基本 R 解决方案的偏好 (3认同)

the*_*ail 12

在末尾粘贴另一个分隔符应该可以按strsplit预期运行。
否则,您可以回退到使用该scan函数,该read.csv/table函数支撑着这些函数:

strsplit(paste0(string1, ","), ",")
##[[1]]
##[1] "apple"  "banana" "orange" ""
Run Code Online (Sandbox Code Playgroud)

一般考虑正则表达式替换:

L <- list(string1, string2, string3, string4, string5)
mapply(
    function(x,s) strsplit(paste0(x, gsub("\\\\", "", s)), split=s),
    L,
    c(",", ",", ",", "\\|", "x")
)

##[[1]]
##[1] "apple"  "banana" "orange" ""      
##
##[[2]]
##[1] "apple"  "banana" "orange" "pear"  
##
##[[3]]
##[1] ""       "apple"  "banana" "orange"
##
##[[4]]
##[1] ""       "apple"  "banana" "orange" ""      
##
##[[5]]
##[1] ""       "apple"  "banana" "orange" "" 
Run Code Online (Sandbox Code Playgroud)

scan选项:

scan(text=string1, sep=",", what="")
##Read 4 items
##[1] "apple"  "banana" "orange" ""
Run Code Online (Sandbox Code Playgroud)

概括:

mapply(
    function(x,s) scan(text=x, sep=s, what=""),
    L,
    c(",", ",", ",", "|", "x")
)
Run Code Online (Sandbox Code Playgroud)

  • 将答案标记为满足所有标准(基本 R 实现,输出与问题中描述的完全相同)。为了供将来参考,ThomasIsCoding 的答案描述了一个替代的 baseR 解决方案,这也非常好。任何不需要 baseR 实现的人都应该看到 GuedesBF 的答案,了解使用 stringr 的简单解决方案 (2认同)