如何在Python中的字符串中找到多次出现的字符串?考虑一下:
>>> text = "Allowed Hello Hollow"
>>> text.find("ll")
1
>>>
Run Code Online (Sandbox Code Playgroud)
所以第一次出现的ll是1,如预期的那样.我如何找到它的下一个出现?
同样的问题对列表有效.考虑:
>>> x = ['ll', 'ok', 'll']
Run Code Online (Sandbox Code Playgroud)
如何查找所有ll索引?
pok*_*oke 103
使用正则表达式,您可以使用re.finditer查找所有(非重叠)出现的事件:
>>> import re
>>> text = 'Allowed Hello Hollow'
>>> for m in re.finditer('ll', text):
print('ll found', m.start(), m.end())
ll found 1 3
ll found 10 12
ll found 16 18
Run Code Online (Sandbox Code Playgroud)
或者,如果您不想要正则表达式的开销,您也可以重复使用str.find以获取下一个索引:
>>> text = 'Allowed Hello Hollow'
>>> index = 0
>>> while index < len(text):
index = text.find('ll', index)
if index == -1:
break
print('ll found at', index)
index += 2 # +2 because len('ll') == 2
ll found at 1
ll found at 10
ll found at 16
Run Code Online (Sandbox Code Playgroud)
这也适用于列表和其他序列.
ins*_*get 24
我认为你在寻找的是 string.count
"Allowed Hello Hollow".count('ll')
>>> 3
Run Code Online (Sandbox Code Playgroud)
希望这会有所帮助
注意:这只能捕获不重叠的事件
bst*_*rre 21
对于列表示例,请使用理解:
>>> l = ['ll', 'xx', 'll']
>>> print [n for (n, e) in enumerate(l) if e == 'll']
[0, 2]
Run Code Online (Sandbox Code Playgroud)
对于字符串类似:
>>> text = "Allowed Hello Hollow"
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 10, 16]
Run Code Online (Sandbox Code Playgroud)
这将列出相邻的"ll"运行,这可能是你想要的,也可能不是你想要的:
>>> text = 'Alllowed Hello Holllow'
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 2, 11, 17, 18]
Run Code Online (Sandbox Code Playgroud)
int*_*ted 13
FWIW,这里有一些我认为比poke的解决方案更整洁的非RE替代品.
第一次使用str.index和检查ValueError:
def findall(sub, string):
"""
>>> text = "Allowed Hello Hollow"
>>> tuple(findall('ll', text))
(1, 10, 16)
"""
index = 0 - len(sub)
try:
while True:
index = string.index(sub, index + len(sub))
yield index
except ValueError:
pass
Run Code Online (Sandbox Code Playgroud)
第二个测试使用str.find和检查哨兵-1使用iter:
def findall_iter(sub, string):
"""
>>> text = "Allowed Hello Hollow"
>>> tuple(findall_iter('ll', text))
(1, 10, 16)
"""
def next_index(length):
index = 0 - length
while True:
index = string.find(sub, index + length)
yield index
return iter(next_index(len(sub)).next, -1)
Run Code Online (Sandbox Code Playgroud)
要将任何这些函数应用于列表,元组或其他可迭代的字符串,您可以使用更高级别的函数 - 将函数作为其参数之一 - 如下所示:
def findall_each(findall, sub, strings):
"""
>>> texts = ("fail", "dolly the llama", "Hello", "Hollow", "not ok")
>>> list(findall_each(findall, 'll', texts))
[(), (2, 10), (2,), (2,), ()]
>>> texts = ("parallellized", "illegally", "dillydallying", "hillbillies")
>>> list(findall_each(findall_iter, 'll', texts))
[(4, 7), (1, 6), (2, 7), (2, 6)]
"""
return (tuple(findall(sub, string)) for string in strings)
Run Code Online (Sandbox Code Playgroud)
对于第一个版本,检查字符串:
def findall(text, sub):
"""Return all indices at which substring occurs in text"""
return [
index
for index in range(len(text) - len(sub) + 1)
if text[index:].startswith(sub)
]
print(findall('Allowed Hello Hollow', 'll'))
# [1, 10, 16]
Run Code Online (Sandbox Code Playgroud)
无需导入re。这应该以线性时间运行,因为它只循环字符串一次(一旦没有足够的字符来容纳子字符串,则在结束之前停止)。我个人也觉得它非常可读。
请注意,这会发现重叠的情况:
print(findall('aaa', 'aa'))
# [0, 1]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
149359 次 |
| 最近记录: |