我正在使用python脚本在文本文件中运行行.我想img在文本文档中搜索标记并将标记作为文本返回.
当我运行正则表达式时,re.match(line)它返回一个 _sre.SRE_MATCH对象.如何让它返回一个字符串?
import sys
import string
import re
f = open("sample.txt", 'r' )
l = open('writetest.txt', 'w')
count = 1
for line in f:
line = line.rstrip()
imgtag = re.match(r'<img.*?>',line)
print("yo it's a {}".format(imgtag))
Run Code Online (Sandbox Code Playgroud)
运行时打印:
yo it's a None
yo it's a None
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e5e0>
yo it's a None
yo it's a None
Run Code Online (Sandbox Code Playgroud)
wfl*_*nny 72
你应该用re.MatchObject.group(0).喜欢
imtag = re.match(r'<img.*?>', line).group(0)
Run Code Online (Sandbox Code Playgroud)
编辑:
你也可能会做更好的事情
imgtag = re.match(r'<img.*?>',line)
if imtag:
print("yo it's a {}".format(imgtag.group(0)))
Run Code Online (Sandbox Code Playgroud)
消除所有的Nones.
Exp*_*lls 10
imgtag.group(0)或imgtag.group()。这将整个匹配项作为字符串返回。你也没有捕捉任何其他东西。
http://docs.python.org/release/2.5.2/lib/match-objects.html
小智 9
请注意,re.match(pattern, string, flags=0)仅返回字符串开头的匹配项。如果要在字符串中的任何位置找到匹配项,请re.search(pattern, string, flags=0)改用 ( https://docs.python.org/3/library/re.html )。这将扫描字符串并返回第一个匹配对象。然后您可以match_object.group(0)按照人们的建议提取匹配的字符串。
考虑到img我可能会推荐几个标签re.findall:
import re
with open("sample.txt", 'r') as f_in, open('writetest.txt', 'w') as f_out:
for line in f_in:
for img in re.findall('<img[^>]+>', line):
print >> f_out, "yo it's a {}".format(img)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
83146 次 |
| 最近记录: |