Python - 使用正则表达式查找多个匹配并打印出来

Question

Python - 使用正则表达式查找多个匹配并打印出来

我需要从HTML源文件中找到表单的内容,我做了一些搜索并找到了很好的方法来做到这一点,但问题是它只打印出第一个找到的,我怎么能循环它并输出所有的表单内容,而不是只是第一个？

line = 'bla bla bla<form>Form 1</form> some text...<form>Form 2</form> more text?'
matchObj = re.search('<form>(.*?)</form>', line, re.S)
print matchObj.group(1)
# Output: Form 1
# I need it to output every form content he found, not just first one...

Run Code Online (Sandbox Code Playgroud)

Answer 1

Pet*_*rin 55

不要使用正则表达式来解析HTML.

但是,如果您需要在字符串中找到所有正则表达式匹配项,请使用该findall函数.

import re
line = 'bla bla bla<form>Form 1</form> some text...<form>Form 2</form> more text?'
matches = re.findall('<form>(.*?)</form>', line, re.DOTALL)
print(matches)

# Output: ['Form 1', 'Form 2']

Run Code Online (Sandbox Code Playgroud)

使'。'特殊字符完全匹配任何字符，包括换行符；如果没有此标志，则'。'将匹配*除换行符之外的所有内容。（http://docs.python.org/2/library/re.html#re.S） (2认同)

Answer 2

Aam*_*nan 21

而不是re.search使用re.findall它将返回你的所有匹配List.或者您也可以使用re.finditer(我最喜欢使用它)它会返回一个Iterator Object,您可以使用它来迭代所有找到的匹配.

line = 'bla bla bla<form>Form 1</form> some text...<form>Form 2</form> more text?'
for match in re.finditer('<form>(.*?)</form>', line, re.S):
    print match.group(1)

Run Code Online (Sandbox Code Playgroud)

Answer 3

Thi*_*ter 5

为此目的使用正则表达式是错误的方法。由于您使用的是 python，您有一个非常棒的库可用于从 HTML 文档中提取部分：BeautifulSoup。

归档时间：	14 年，8 月前
查看次数：	91632 次
最近记录：	7 年前