R中gsub和regex的问题

Sox*_*man 5 regex r gsub

我在R中使用gsub将文本添加到字符串的中间.它工作得很好,但由于某种原因,当位置太长时,它会抛出错误.代码如下:

gsub(paste0('^(.{', as.integer(loc[1])-1, '})(.+)$'), new_cols, sql)
Run Code Online (Sandbox Code Playgroud)
Error in gsub(paste0("^(.{273})(.+)$"), new_cols, sql) :  invalid
  regular expression '^(.{273})(.+)$', reason 'Invalid contents of {}'
Run Code Online (Sandbox Code Playgroud)

当括号中的数字(在这种情况下为273)较小时,此代码可以正常工作,但当它很大时则不行.


这会产生错误:

sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."  
new_cols <- "happy" 
gsub('^(.{125})(.+)$', new_cols, sql)  #**Works
gsub('^(.{273})(.+)$', new_cols, sql) 
Run Code Online (Sandbox Code Playgroud)
Error in gsub("^(.{273})(.+)$", new_cols, sql) :    invalid regular
  expression '^(.{273})(.+)$', reason 'Invalid contents of {}'
Run Code Online (Sandbox Code Playgroud)

Wik*_*żew 13

背景

R gsub默认使用TRE正则表达式库.限制量词中的边界从0开始有效,直到RE_DUP_MAX在TRE代码中定义.看到这个TRE参考:

是以下内容,其中的一个nm之间的无符号十进制整数0RE_DUP_MAX

似乎RE_DUP_MAX设置为255(参见此TRE源文件显示#define RE_DUP_MAX 255),因此,您不能在{n,m}限制量词中使用更多.

使用PCRE正则表达式风味,添加perl = TRUE它将起作用.

R演示:

> sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."
> new_cols <- "happy"
> gsub('^(.{273})(.+)$', new_cols, sql, perl=TRUE)
[1] "happy"
Run Code Online (Sandbox Code Playgroud)

  • 除非代码中的`T < - FALSE`. (4认同)
  • [你不说......](http://www.r-bloggers.com/r-tip-avoid-using-t-and-f-as-synonyms-for-true-and-false/) (3认同)
  • 谢谢!效果很好! (2认同)