用正则表达式拆分字符串

Jef*_*ler 8 regex r strsplit

我正在寻找一个通用形式的字符串,其中方括号表示字符串的"部分".例如:

x <- "[a] + [bc] + 1"
Run Code Online (Sandbox Code Playgroud)

并返回一个如下所示的字符向量:

"[a]"  " + "  "[bc]" " + 1"
Run Code Online (Sandbox Code Playgroud)

编辑:结束使用此:

x <- "[a] + [bc] + 1"
x <- gsub("\\[",",[",x)
x <- gsub("\\]","],",x)
strsplit(x,",")
Run Code Online (Sandbox Code Playgroud)

42-*_*42- 6

我已经看过TylerRinker的代码并怀疑它可能比这更清楚,但这可以作为学习一组不同功能的方法.(在我注意到它在空格上分开之前,我更喜欢他.)我尝试使用它来处理,strsplit但该功能总是删除分隔符.也许这可以适应newstrsplit在分离器上分裂,但留下它们?可能不需要在第一个或最后一个位置拆分并区分开启和关闭分隔符.

scan(text=   # use scan to separate after insertion of commas
            gsub("\\]", "],",   # put commas in after "]"'s
            gsub(".\\[", ",[",  x)) ,  # add commas before "[" unless at first position
        what="", sep=",")    # tell scan this character argument and separators are ","
#Read 4 items
#[1] "[a]"  " +"   "[bc]" " + 1"
Run Code Online (Sandbox Code Playgroud)


Tyl*_*ker 5

这是一种懒惰的方法:

FUN <- function(x) {
    all <- unlist(strsplit(x, "\\s+"))
    last <- paste(c(" ", tail(all, 2)), collapse="")
    c(head(all, -2), last)
}

x <- "[a] + [bc] + 1"    
FUN(x)

## > FUN(x)
## [1] "[a]"  "+"    "[bc]" " +1"
Run Code Online (Sandbox Code Playgroud)


jub*_*uba 5

您可以手动计算分割点并使用substring:

split.pos <- gregexpr('\\[.*?]',x)[[1]]
split.length <- attr(split.pos, "match.length")
split.start <- sort(c(split.pos, split.pos+split.length))
split.end <- c(split.start[-1]-1, nchar(x))
substring(x,split.start,split.end)
#  [1] "[a]"  " + "  "[bc]" " + 1"
Run Code Online (Sandbox Code Playgroud)


The*_*ras 5

这里有一个版本,它在括号中分开并使用正向前瞻和后观保持结果:

splitme <- function(x) {
  x <- unlist(strsplit(x, "(?=\\[)", perl=TRUE))
  x <- unlist(strsplit(x, "(?<=\\])", perl=TRUE))
  for (i in which(x=="[")) {
    x[i+1] <- paste(x[i], x[i+1], sep="")
  }
  x[-which(x=="[")]
}
splitme(x)
#[1] "[a]"  " + "  "[bc]" " + 1"
Run Code Online (Sandbox Code Playgroud)