在Python中查找字符串中多次出现的字符串

Question

在Python中查找字符串中多次出现的字符串

如何在Python中的字符串中找到多次出现的字符串？考虑一下:

>>> text = "Allowed Hello Hollow"
>>> text.find("ll")
1
>>>

Run Code Online (Sandbox Code Playgroud)

所以第一次出现的ll是1,如预期的那样.我如何找到它的下一个出现？

同样的问题对列表有效.考虑:

>>> x = ['ll', 'ok', 'll']

Run Code Online (Sandbox Code Playgroud)

如何查找所有ll索引？

Answer 1

pok*_*oke 103

使用正则表达式,您可以使用re.finditer查找所有(非重叠)出现的事件:

>>> import re
>>> text = 'Allowed Hello Hollow'
>>> for m in re.finditer('ll', text):
         print('ll found', m.start(), m.end())

ll found 1 3
ll found 10 12
ll found 16 18

Run Code Online (Sandbox Code Playgroud)

或者,如果您不想要正则表达式的开销,您也可以重复使用str.find以获取下一个索引:

>>> text = 'Allowed Hello Hollow'
>>> index = 0
>>> while index < len(text):
        index = text.find('ll', index)
        if index == -1:
            break
        print('ll found at', index)
        index += 2 # +2 because len('ll') == 2

ll found at  1
ll found at  10
ll found at  16

Run Code Online (Sandbox Code Playgroud)

这也适用于列表和其他序列.

既然你提到整个`index + = 2`的东西,如果你把它应用到字符串'lllll',它将会错过四次出现'll'中的两次.最好坚持使用`index + = 1`作为字符串. (4认同)
不使用正则表达式就没有办法做到吗？ (2认同)
列表没有`find`.但它适用于`index`,你只需要`除了ValueError`而不是测试-1 (2认同)

Answer 2

ins*_*get 24

我认为你在寻找的是 string.count

"Allowed Hello Hollow".count('ll')
>>> 3

Run Code Online (Sandbox Code Playgroud)

希望这会有所帮助
注意:这只能捕获不重叠的事件

这是[计算字符串中给定子字符串的出现次数](/sf/ask/622993381/ -string），而不是这个实际问题，它要求找到匹配的索引，而不是它们的计数...... (5认同)
我需要索引. (3认同)

Answer 3

bst*_*rre 21

对于列表示例,请使用理解:

>>> l = ['ll', 'xx', 'll']
>>> print [n for (n, e) in enumerate(l) if e == 'll']
[0, 2]

Run Code Online (Sandbox Code Playgroud)

对于字符串类似:

>>> text = "Allowed Hello Hollow"
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 10, 16]

Run Code Online (Sandbox Code Playgroud)

这将列出相邻的"ll"运行,这可能是你想要的,也可能不是你想要的:

>>> text = 'Alllowed Hello Holllow'
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 2, 11, 17, 18]

Run Code Online (Sandbox Code Playgroud)

这是非常低效的. (5认同)
@Clément 发布了一个更有效的例子 (2认同)

Answer 4

int*_*ted 13

FWIW,这里有一些我认为比poke的解决方案更整洁的非RE替代品.

第一次使用str.index和检查ValueError:

def findall(sub, string):
    """
    >>> text = "Allowed Hello Hollow"
    >>> tuple(findall('ll', text))
    (1, 10, 16)
    """
    index = 0 - len(sub)
    try:
        while True:
            index = string.index(sub, index + len(sub))
            yield index
    except ValueError:
        pass

Run Code Online (Sandbox Code Playgroud)

第二个测试使用str.find和检查哨兵-1使用iter:

def findall_iter(sub, string):
    """
    >>> text = "Allowed Hello Hollow"
    >>> tuple(findall_iter('ll', text))
    (1, 10, 16)
    """
    def next_index(length):
        index = 0 - length
        while True:
            index = string.find(sub, index + length)
            yield index
    return iter(next_index(len(sub)).next, -1)

Run Code Online (Sandbox Code Playgroud)

要将任何这些函数应用于列表,元组或其他可迭代的字符串,您可以使用更高级别的函数 - 将函数作为其参数之一 - 如下所示:

def findall_each(findall, sub, strings):
    """
    >>> texts = ("fail", "dolly the llama", "Hello", "Hollow", "not ok")
    >>> list(findall_each(findall, 'll', texts))
    [(), (2, 10), (2,), (2,), ()]
    >>> texts = ("parallellized", "illegally", "dillydallying", "hillbillies")
    >>> list(findall_each(findall_iter, 'll', texts))
    [(4, 7), (1, 6), (2, 7), (2, 6)]
    """
    return (tuple(findall(sub, string)) for string in strings)

Run Code Online (Sandbox Code Playgroud)

将 `.next` 更改为 `.__next__` 以使 findall_iter` 在 python 3.7 中工作 (2认同)

Answer 5

Cra*_*cky 5

对于第一个版本，检查字符串：

def findall(text, sub):
    """Return all indices at which substring occurs in text"""
    return [
        index
        for index in range(len(text) - len(sub) + 1)
        if text[index:].startswith(sub)
    ]

print(findall('Allowed Hello Hollow', 'll'))
# [1, 10, 16]

Run Code Online (Sandbox Code Playgroud)

无需导入re。这应该以线性时间运行，因为它只循环字符串一次（一旦没有足够的字符来容纳子字符串，则在结束之前停止）。我个人也觉得它非常可读。

请注意，这会发现重叠的情况：

print(findall('aaa', 'aa'))
# [0, 1]

Run Code Online (Sandbox Code Playgroud)

归档时间：	15 年，3 月前
查看次数：	149359 次
最近记录：	7 年，6 月前