如何修剪和替换字符串

jac*_*son 5 regex string r

string<-c("       this is a string  ")
Run Code Online (Sandbox Code Playgroud)

是否可以在弦的两侧(或根据需要只是一侧)修剪掉白色空间,并用R中的所需字符替换它?字符串两侧的白色空格数不同,必须在更换时保留.

"~~~~~~~this is a string~~"
Run Code Online (Sandbox Code Playgroud)

And*_*rie 6

用途gsub:

gsub(" ", "~", "    this is a string  ")
[1] "~~~~this~is~a~string~~"
Run Code Online (Sandbox Code Playgroud)

此函数使用正则表达式替换(即子)所有出现的字符串内的模式.

在您的情况下,您必须以特殊方式表达模式:

gsub("(^ *)|( *$)", "~~~", "    this is a string  ")
[1] "~~~this is a string~~~"
Run Code Online (Sandbox Code Playgroud)

模式意味着:

  • (^ *):在字符串的开头找到一个或多个空格
  • ( *$):在字符串的末尾找到一个或多个空格
  • `|:OR运算符

现在,您可以使用此方法来解决使用新角色替换每个空间的问题:

txt <- "    this is a string  "
foo <- function(x, new="~"){
  lead <- gsub("(^ *).*", "\\1", x)
  last <- gsub(".*?( *$)", "\\1", x)
  mid  <- gsub("(^ *)|( *$)", "", x)
  paste0(
    gsub(" ", new, lead),
    mid,
    gsub(" ", new, last)
  )
}

> foo("    this is a string  ")
[1] "~~~~this is a string~~"

> foo(" And another one        ")
[1] "~And another one~~~~~~~~"
Run Code Online (Sandbox Code Playgroud)

有关更多信息,请参阅?gsub?regexp.

  • 这也增加了单词之间的空格. (2认同)

A5C*_*2T1 6

这似乎是一种低效的方式,但也许你应该朝着方向gregexprregmatches不是gsub:

x <- "    this is a string  "
pattern <- "^ +?\\b|\\b? +$"
startstop <- gsub(" ", "~", regmatches(x, gregexpr(pattern, x))[[1]])
text <- paste(regmatches(x, gregexpr(pattern, x), invert=TRUE)[[1]], collapse="")
paste0(startstop[1], text, startstop[2])
# [1] "~~~~this is a string~~"
Run Code Online (Sandbox Code Playgroud)

而且,为了好玩,作为一个功能,以及一个"矢量化"功能:

## The function
replaceEnds <- function(string) {
  pattern <- "^ +?\\b|\\b? +$"
  startstop <- gsub(" ", "~", regmatches(string, gregexpr(pattern, string))[[1]])
  text <- paste(regmatches(string, gregexpr(pattern, string), invert = TRUE)[[1]],
                collapse = "")
  paste0(startstop[1], text, startstop[2])
}

## use Vectorize here if you want to apply over a vector
vReplaceEnds <- Vectorize(replaceEnds)
Run Code Online (Sandbox Code Playgroud)

一些样本数据:

myStrings <- c("    Four at the start, 2 at the end  ", 
               "   three at the start, one at the end ")

vReplaceEnds(myStrings)
#        Four at the start, 2 at the end        three at the start, one at the end  
#  "~~~~Four at the start, 2 at the end~~" "~~~three at the start, one at the end~"
Run Code Online (Sandbox Code Playgroud)


Sim*_*lon 6

或者使用更复杂的模式匹配和gsub......

gsub("\\s(?!\\b)|(?<=\\s)\\s(?=\\b)", "~", "    this is a string  " , perl = TRUE )
#[1] "~~~~this is a string~~"
Run Code Online (Sandbox Code Playgroud)

或者@ AnandaMahto的数据:

gsub("\\s(?!\\b)|(?<=\\s)\\s(?=\\b)", "~", myStrings , perl = TRUE )
#[1] "~~~~Four at the start, 2 at the end~~" 
#[2] "~~~three at the start, one at the end~"
Run Code Online (Sandbox Code Playgroud)

说明

这使用正面和负面的前瞻,并查看断言背后:

  • \\s(?!\\b)- 匹配一个空格,\\s而不是单词边界(?!\\b).除了第一个单词之前的最后一个空格之外,这本身就可以工作,也就是我们自己得到的
    "~~~~ this is a string~~".所以我们需要另一种模式......

  • (?<=\\s)\\s(?=\\b)-匹配的空间,\\s通过另一个空间之前,(?<=\\s)并且 随后字边界,(?=\\b).

它是gsub如此,它试图使它可以达到最大匹配数.