如何使用ruby gsub Regexp与许多匹配?

Mah*_*led 20 ruby regex csv string-substitution gsub

我有csv文件内容在引用文本内有双引号

test,first,line,"you are a "kind" man",thanks
again,second,li,"my "boss" is you",good
Run Code Online (Sandbox Code Playgroud)

我需要用""替换逗号前面或后面的每个双引号.

test,first,line,"you are a ""kind"" man",thanks
again,second,li,"my ""boss"" is you",good
Run Code Online (Sandbox Code Playgroud)

所以"被"替换为""

我试过了

x.gsub(/([^,])"([^,])/, "#{$1}\"\"#{$2}")
Run Code Online (Sandbox Code Playgroud)

但没有奏效

Phr*_*ogz 45

如果引号出现在第一个值的开头或最后一个值的末尾,则正则表达式需要更大胆一些:

csv = <<ENDCSV
test,first,line,"you are a "kind" man",thanks
again,second,li,"my "boss" is you",good
more,""Someone" said that you're "cute"",yay
"watch out for this",and,also,"this test case"
ENDCSV

puts csv.gsub(/(?<!^|,)"(?!,|$)/,'""')
#=> test,first,line,"you are a ""kind"" man",thanks
#=> again,second,li,"my ""boss"" is you",good
#=> more,"""Someone"" said that you're ""cute""",yay
#=> "watch out for this",and,also,"this test case"
Run Code Online (Sandbox Code Playgroud)

上面的正则表达式使用Ruby 1.9中可用的负向lookbehind和负向前瞻断言(锚点).

  • (?<!^|,)- 紧接在此点之前,不得有line(^)或逗号的开头
  • " - 找到双引号
  • (?!,|$)- 紧跟此点之后,一定不能有逗号或行尾($)

作为奖励,由于您实际上没有捕获任何一方的角色,因此您无需担心\1在替换字符串中正确使用.

有关更多信息,请参阅官方Ruby regex文档中的"Anchors"部分.


但是,对于你的情况下需要在输出中替换匹配,你可以使用任何如下:

"hello".gsub /([aeiou])/, '<\1>'            #=> "h<e>ll<o>"
"hello".gsub /([aeiou])/, "<\\1>"           #=> "h<e>ll<o>"
"hello".gsub(/([aeiou])/){ |m| "<#{$1}>" }  #=> "h<e>ll<o>"
Run Code Online (Sandbox Code Playgroud)

您不能像替换字符串那样在替换字符串中使用字符串插值:

"hello".gsub /([aeiou])/, "<#{$1}>"
 #=> "h<previousmatch>ll<previousmatch>"
Run Code Online (Sandbox Code Playgroud)

......因为这串插发生一次,之前gsub已运行.使用块形式gsub重新调用每个匹配的块,此时全局$1已被适当填充并可供使用.


编辑:对于Ruby 1.8(为什么你在使用它?)你可以使用:

puts csv.gsub(/([^,\n\r])"([^,\n\r])/,'\1""\2')
Run Code Online (Sandbox Code Playgroud)


Dav*_*son 9

假设s是一个字符串,这将工作:

puts s.gsub(/([^,])"([^,])/, "\\1\"\"\\2")
Run Code Online (Sandbox Code Playgroud)

  • 当你在内容中使用双引号时,最好使用单引号来引用它们,如''1""\ 2'`或使用第三种形式`%q [\ 1""\ 2]` (2认同)