具有重叠事件的字符串计数

cal*_*pto 50 python string search

计算给定字符串出现次数的最佳方法是什么,包括python中的重叠?这是最明显的方式:

def function(string, str_to_search_for):
      count = 0
      for x in xrange(len(string) - len(str_to_search_for) + 1):
           if string[x:x+len(str_to_search_for)] == str_to_search_for:
                count += 1
      return count


function('1011101111','11')
returns 5
Run Code Online (Sandbox Code Playgroud)

或者在python中有更好的方法吗?

Joc*_*zel 73

嗯,这可能会更快,因为它在C中进行比较:

def occurrences(string, sub):
    count = start = 0
    while True:
        start = string.find(sub, start) + 1
        if start > 0:
            count+=1
        else:
            return count
Run Code Online (Sandbox Code Playgroud)


jam*_*lak 39

>>> import re
>>> text = '1011101111'
>>> len(re.findall('(?=11)', text))
5
Run Code Online (Sandbox Code Playgroud)

如果你不想将整个匹配列表加载到内存中,这永远不会成为问题!如果你真的想要,你可以这样做:

>>> sum(1 for _ in re.finditer('(?=11)', text))
5
Run Code Online (Sandbox Code Playgroud)

作为一个函数(re.escape确保子字符串不会干扰正则表达式):

>>> def occurrences(text, sub):
        return len(re.findall('(?={0})'.format(re.escape(sub)), text))

>>> occurrences(text, '11')
5
Run Code Online (Sandbox Code Playgroud)


Dav*_*d C 12

您还可以尝试使用支持重叠匹配的新Python正则表达式模块.

import regex as re

def count_overlapping(text, search_for):
    return len(re.findall(search_for, text, overlapped=True))

count_overlapping('1011101111','11')  # 5
Run Code Online (Sandbox Code Playgroud)


Dim*_*nek 10

Python str.count计算非重叠子串:

In [3]: "ababa".count("aba")
Out[3]: 1
Run Code Online (Sandbox Code Playgroud)

以下是计算重叠序列的几种方法,我相信还有更多:)

前瞻性正则表达式

如何找到与正则表达式重叠的匹配?

In [10]: re.findall("a(?=ba)", "ababa")
Out[10]: ['a', 'a']
Run Code Online (Sandbox Code Playgroud)

生成所有子串

In [11]: data = "ababa"
In [17]: sum(1 for i in range(len(data)) if data.startswith("aba", i))
Out[17]: 2
Run Code Online (Sandbox Code Playgroud)

  • 更简洁 `sum(data.startswith("aba", i) for i, _ in enumerate(data))` :) (3认同)