这是我的代码
stopwordlist = "a|an|all"
File.open('0_9.txt').each do |line|
line.downcase!
line.gsub!( /\b#{stopwordlist}\b/,'')
File.open('0_9_2.txt', 'w') { |f| f.write(line) }
end
Run Code Online (Sandbox Code Playgroud)
我想删除单词 - a,an和all但是,它也匹配子串并删除它们
输入示例 -
Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life
Run Code Online (Sandbox Code Playgroud)
我得到输出 -
bromwell high is  cartoon comedy. it r t the same time s some other programs bout school life
Run Code Online (Sandbox Code Playgroud)
如您所见,它与子字符串匹配.
如何使它与单词匹配而不是子串?
|正则表达式中的运算符占用最广泛的范围.您的原始正则表达式匹配\ba或者an或all\b.
将整个正则表达式更改为:
/\b(?:#{stopwordlist})\b/
Run Code Online (Sandbox Code Playgroud)
或者stopwordlist改成正则表达式而不是字符串.
stopwordlist = /a|an|all/
Run Code Online (Sandbox Code Playgroud)
更好的是,您可能想要使用Regexp.union.