cal*_*pto 50 python string search
计算给定字符串出现次数的最佳方法是什么,包括python中的重叠?这是最明显的方式:
def function(string, str_to_search_for):
count = 0
for x in xrange(len(string) - len(str_to_search_for) + 1):
if string[x:x+len(str_to_search_for)] == str_to_search_for:
count += 1
return count
function('1011101111','11')
returns 5
Run Code Online (Sandbox Code Playgroud)
?
或者在python中有更好的方法吗?
Joc*_*zel 73
嗯,这可能会更快,因为它在C中进行比较:
def occurrences(string, sub):
count = start = 0
while True:
start = string.find(sub, start) + 1
if start > 0:
count+=1
else:
return count
Run Code Online (Sandbox Code Playgroud)
jam*_*lak 39
>>> import re
>>> text = '1011101111'
>>> len(re.findall('(?=11)', text))
5
Run Code Online (Sandbox Code Playgroud)
如果你不想将整个匹配列表加载到内存中,这永远不会成为问题!如果你真的想要,你可以这样做:
>>> sum(1 for _ in re.finditer('(?=11)', text))
5
Run Code Online (Sandbox Code Playgroud)
作为一个函数(re.escape
确保子字符串不会干扰正则表达式):
>>> def occurrences(text, sub):
return len(re.findall('(?={0})'.format(re.escape(sub)), text))
>>> occurrences(text, '11')
5
Run Code Online (Sandbox Code Playgroud)
Dav*_*d C 12
您还可以尝试使用支持重叠匹配的新Python正则表达式模块.
import regex as re
def count_overlapping(text, search_for):
return len(re.findall(search_for, text, overlapped=True))
count_overlapping('1011101111','11') # 5
Run Code Online (Sandbox Code Playgroud)
Dim*_*nek 10
Python str.count
计算非重叠子串:
In [3]: "ababa".count("aba")
Out[3]: 1
Run Code Online (Sandbox Code Playgroud)
以下是计算重叠序列的几种方法,我相信还有更多:)
In [10]: re.findall("a(?=ba)", "ababa")
Out[10]: ['a', 'a']
Run Code Online (Sandbox Code Playgroud)
In [11]: data = "ababa"
In [17]: sum(1 for i in range(len(data)) if data.startswith("aba", i))
Out[17]: 2
Run Code Online (Sandbox Code Playgroud)