re.finditer和re.findall之间的不同行为

Question

re.finditer和re.findall之间的不同行为

我使用以下代码:

CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>'
pattern = re.compile(CARRIS_REGEX, re.UNICODE)
matches = pattern.finditer(mailbody)
findall = pattern.findall(mailbody)

Run Code Online (Sandbox Code Playgroud)

但是finditer和findall正在寻找不同的东西.Findall确实找到了给定字符串中的所有匹配项.但是finditer只找到第一个,返回一个只有一个元素的迭代器.

如何使finditer和findall的行为方式相同？

谢谢

Answer 1

Tim*_*ker 29

我不能在这里重现这一点.尝试过使用Python 2.7和3.1.

finditer和之间的一个区别findall是前者返回正则表达式匹配对象,而另一个返回匹配的捕获组的元组(如果没有捕获组,则返回整个匹配).

所以

import re
CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>'
pattern = re.compile(CARRIS_REGEX, re.UNICODE)
mailbody = open("test.txt").read()
for match in pattern.finditer(mailbody):
    print(match)
print()
for match in pattern.findall(mailbody):
    print(match)

Run Code Online (Sandbox Code Playgroud)

版画

<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>

('790', 'PR. REAL', '21:06', '04m')
('758', 'PORTAS BENFICA', '21:10', '09m')
('790', 'PR. REAL', '21:14', '13m')
('758', 'PORTAS BENFICA', '21:21', '19m')
('790', 'PR. REAL', '21:29', '28m')
('758', 'PORTAS BENFICA', '21:38', '36m')
('758', 'SETE RIOS', '21:49', '47m')
('758', 'SETE RIOS', '22:09', '68m')

Run Code Online (Sandbox Code Playgroud)

如果你想从相同的输出finditer如您取得findall,您需要

for match in pattern.finditer(mailbody):
    print(tuple(match.groups()))

Run Code Online (Sandbox Code Playgroud)

Answer 2

Tim*_*ara 5

你不能让他们的行为方式相同,因为他们是不同的.如果你真的想要创建一个结果列表finditer,那么你可以使用列表理解:

>>> [match for match in pattern.finditer(mailbody)]
[...]

Run Code Online (Sandbox Code Playgroud)

通常,使用for循环来访问返回的匹配项re.finditer:

>>> for match in pattern.finditer(mailbody):
...     ...

Run Code Online (Sandbox Code Playgroud)

`[匹配pattern.finditer(mailbody)中的匹配]`只是一种较慢且不太可读的方式来表示`list(pattern.finditer(mailbody))` (7认同)

Answer 3

Ayu*_*ush 5

re.findall（pattern.string）

findall（）返回字符串中所有不重复的模式匹配作为字符串 列表。

re.finditer（）

finditer（）返回可调用对象。

在这两个函数中，从左到右扫描字符串，并按找到的顺序返回匹配项。

Answer 4

Kus*_*era 5

我从Python 2.* 文档中的正则表达式操作中得到了这个示例，并且在此处详细描述了该示例并进行了一些修改。为了解释整个示例，让我们获取字符串类型变量调用，

text = "He was carefully disguised but captured quickly by police."

Run Code Online (Sandbox Code Playgroud)

和编译类型正则表达式模式为，

regEX = r"\w+ly"
pattern = re.compile(regEX)

Run Code Online (Sandbox Code Playgroud)

\wmean匹配任何单词字符（字母数字和下划线），+mean匹配 1 个或多个前面的标记，整个含义是选择任何以 ly. 满足上述正则表达式的只有两个 2 词（“小心”和“快速”）。

在进入re.findall()或re.finditer() 之前，让我们看看re.search()在Python 2.* Documentation 中的含义。

扫描字符串以查找正则表达式模式产生匹配的第一个位置，并返回相应的 MatchObject 实例。如果字符串中没有位置与模式匹配，则返回 None；请注意，这与在字符串中的某个点找到零长度匹配不同。

以下代码行让您对re.search()有基本的了解。

search = pattern.search(text)
print(search)
print(type(search))

#output
<re.Match object; span=(7, 16), match='carefully'>
<class 're.Match'>

Run Code Online (Sandbox Code Playgroud)

它将根据Python 2.* Documentation生成具有 13 个受支持方法和属性的类类型对象的re.MatchObject。这个span()方法包含变量中匹配单词的起点和终点（在上面的例子中是 7 和 16）。re.search()方法只考虑第一个匹配，否则返回。textNone

让我们进入这个问题，在此之前看看re.finditer()在Python 2.* Documentation 中是什么意思。

返回一个迭代器，在字符串中 RE 模式的所有非重叠匹配上产生 MatchObject 实例。从左到右扫描字符串，并按找到的顺序返回匹配项。结果中包含空匹配项。

接下来的代码行让您对re.finditer()有基本的了解。

finditer = pattern.finditer(text)
print(finditer)
print(type(finditer))

#output
<callable_iterator object at 0x040BB690>
<class 'callable_iterator'>

Run Code Online (Sandbox Code Playgroud)

上面的例子给了我们需要循环的迭代器对象。这显然不是我们想要的结果。让我们循环finditer看看这个Iterator Objects里面有什么。

for anObject in finditer:
    print(anObject)
    print(type(anObject))
    print()

#output
<re.Match object; span=(7, 16), match='carefully'>
<class 're.Match'>

<re.Match object; span=(40, 47), match='quickly'>
<class 're.Match'>

Run Code Online (Sandbox Code Playgroud)

这个结果与我们之前得到的re.search()结果非常相似。但是我们可以在上面的输出中看到新的结果，<re.Match object; span=(40, 47), match='quickly'>。正如我之前在Python 2.* 文档中提到的，re.search()将扫描字符串以查找正则表达式模式产生匹配的第一个位置，而re.finditer()将扫描字符串查找所有位置正则表达式模式产生匹配并返回比re.findall()方法更多的细节。

这里re.findall()在Python 2.* Documentation 中是什么意思。

以字符串列表的形式返回字符串中模式的所有非重叠匹配项。从左到右扫描字符串，并按找到的顺序返回匹配项。如果模式中存在一个或多个组，则返回组列表；如果模式有多个组，这将是一个元组列表。结果中包含空匹配项。

让我们了解re.findall() 中发生了什么。

findall = pattern.findall(text)
print(findall)
print(type(findall))

#output
['carefully', 'quickly']
<class 'list'>

Run Code Online (Sandbox Code Playgroud)

这个输出只给我们text变量中匹配的单词，否则返回一个空列表。这名单中，其输出类似于match在属性re.MatchObject。

这是完整的代码，我在Python 3.7 中尝试过。

import re

text = "He was carefully disguised but captured quickly by police."

regEX = r"\w+ly"
pattern = re.compile(regEX)

search = pattern.search(text)
print(search)
print(type(search))
print()

findall = pattern.findall(text)
print(findall)
print(type(findall))
print()

finditer = pattern.finditer(text)
print(finditer)
print(type(finditer))
print()
for anObject in finditer:
    print(anObject)
    print(type(anObject))
    print()

Run Code Online (Sandbox Code Playgroud)

归档时间：	15 年，4 月前
查看次数：	46478 次
最近记录：	6 年，3 月前