如何在字符串中获得可能重叠的匹配

Question

如何在字符串中获得可能重叠的匹配

我正在寻找一种方法,无论是在Ruby还是Javascript中,它都会在字符串中为正则表达式提供所有匹配,可能重叠.

假设我有str = "abcadc",我希望找到a后跟任意数量字符的事件,然后是c.我正在寻找的结果是["abc", "adc", "abcadc"].有关如何实现这一目标的任何想法？

str.scan(/a.*c/)会给我的["abcadc"],str.scan(/(?=(a.*c))/).flatten会给我的["abcadc", "adc"].

Answer 1

def matching_substrings(string, regex)
  string.size.times.each_with_object([]) do |start_index, maching_substrings|
    start_index.upto(string.size.pred) do |end_index|
      substring = string[start_index..end_index]
      maching_substrings.push(substring) if substring =~ /^#{regex}$/
    end
  end
end

matching_substrings('abcadc', /a.*c/) # => ["abc", "abcadc", "adc"]
matching_substrings('foobarfoo', /(\w+).*\1/) 
  # => ["foobarf",
  #     "foobarfo",
  #     "foobarfoo",
  #     "oo",
  #     "oobarfo",
  #     "oobarfoo",
  #     "obarfo",
  #     "obarfoo",
  #     "oo"]
matching_substrings('why is this downvoted?', /why.*/)
  # => ["why",
  #     "why ",
  #     "why i",
  #     "why is",
  #     "why is ",
  #     "why is t",
  #     "why is th",
  #     "why is thi",
  #     "why is this",
  #     "why is this ",
  #     "why is this d",
  #     "why is this do",
  #     "why is this dow",
  #     "why is this down",
  #     "why is this downv",
  #     "why is this downvo",
  #     "why is this downvot",
  #     "why is this downvote",
  #     "why is this downvoted",
  #     "why is this downvoted?"]

Run Code Online (Sandbox Code Playgroud)

Answer 2

aef*_*aef 11

在Ruby中,您可以使用以下方法获得预期结果:

str = "abcadc"
[/(a[^c]*c)/, /(a.*c)/].flat_map{ |pattern| str.scan(pattern) }.reduce(:+)
# => ["abc", "adc", "abcadc"]

Run Code Online (Sandbox Code Playgroud)

这种方式是否适合您,在很大程度上取决于您真正想要实现的目标.

我试着把它放到一个单独的表达式中,但我无法使它工作.我真的想知道是否有一些科学原因,这不能被正则表达式解析,或者我只是不太了解Ruby的解析器Oniguruma来做到这一点.

假设OP的字符串和正则表达式只是一个例子,这并没有给出问题的通用答案. (4认同)
该解决方案可以轻松适应您的第一个示例.对于第二个,你可能是对的.我不知道如何适应它.这就是为什么我写这句话说它取决于OP究竟想要实现的目标. (2认同)
@WilliamFeng在'abcadcdc`中的[期望结果](http://ideone.com/52UNhp)应该包括`abcadc`,`adcdc`？ (2认同)

Answer 3

Wik*_*żew 8

在JS中:

function doit(r, s) {
  var res = [], cur;
  r = RegExp('^(?:' + r.source + ')$', r.toString().replace(/^[\s\S]*\/(\w*)$/, '$1'));
  r.global = false;
  for (var q = 0; q < s.length; ++q)
    for (var w = q; w <= s.length; ++w)
      if (r.test(cur = s.substring(q, w)))
        res.push(cur);
  return res;
}
document.body.innerHTML += "<pre>" + JSON.stringify(doit( /a.*c/g, 'abcadc' ), 0, 4) + "</pre>";

Run Code Online (Sandbox Code Playgroud)

Answer 4

Mar*_*eed 8

您想要所有可能的匹配,包括重叠匹配.正如您所指出的那样," 如何找到与正则表达式重叠匹配？ " 的前瞻技巧对您的情况不起作用.

在一般情况下,我唯一能想到的就是生成字符串的所有可能的子字符串,并根据正则表达式的锚定版本检查每个字符串.这是蛮力,但它的确有效.

红宝石:

def all_matches(str, regex)
  (n = str.length).times.reduce([]) do |subs, i|
     subs += [*i..n].map { |j| str[i,j-i] }
  end.uniq.grep /^#{regex}$/
end

all_matches("abcadc", /a.*c/) 
#=> ["abc", "abcadc", "adc"]

Run Code Online (Sandbox Code Playgroud)

使用Javascript:

function allMatches(str, regex) {
  var i, j, len = str.length, subs={};
  var anchored = new RegExp('^' + regex.source + '$');
  for (i=0; i<len; ++i) {
    for (j=i; j<=len; ++j) {
       subs[str.slice(i,j)] = true;
    }
  }
  return Object.keys(subs).filter(function(s) { return s.match(anchored); });
}

Run Code Online (Sandbox Code Playgroud)

Answer 5

Ale*_*kin 5

? str = "abcadc"
? from = str.split(/(?=\p{L})/).map.with_index { |c, i| i if c == 'a' }.compact
? to   = str.split(/(?=\p{L})/).map.with_index { |c, i| i if c == 'c' }.compact
? from.product(to).select { |f,t| f < t }.map { |f,t| str[f..t] }
#? [
#  [0] "abc",
#  [1] "abcadc",
#  [2] "adc"
# ]

Run Code Online (Sandbox Code Playgroud)

我相信,有一种奇特的方法来查找字符串中字符的所有索引,但我无法找到它:(任何想法？

拆分"unicode char boundary"使其能够使用'a?bc?'或等字符串'U?ve Østergaard'.

对于更通用的解决方案,它接受任何"from"和"to"序列,应该只引入一点修改:在字符串中查找"from"和"to"的所有索引.

Answer 6

Car*_*and 5

这是一种类似于@ndn和@ Mark的方法,适用于任何字符串和正则表达式.我已经实现了这个方法,String因为我希望看到它.这不是一个伟大的赞美String#[]和String#scan？

class String
  def all_matches(regex)
    return [] if empty?
    r = /^#{regex}$/
    1.upto(size).with_object([]) { |i,a|
      a.concat(each_char.each_cons(i).map(&:join).select { |s| s =~ r }) }
  end
end

'abcadc'.all_matches /a.*c/
  # => ["abc", "abcadc", "adc"]
'aaabaaa'.all_matches(/a.*a/)
  #=> ["aa", "aa", "aa", "aa", "aaa", "aba", "aaa", "aaba", "abaa", "aaaba",
  #    "aabaa", "abaaa", "aaabaa", "aabaaa", "aaabaaa"]

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，11 月前
查看次数：	1482 次
最近记录：	6 年，6 月前