如何用正则表达式提取img标签中的src？

Question

如何用正则表达式提取img标签中的src？

我正在尝试从HTML img标记中提取图像源URL.

如果html数据如下所示:

<div> My profile <img width='300' height='300' src='http://domain.com/profile.jpg'> </div>

Run Code Online (Sandbox Code Playgroud)

要么

<div> My profile <img width="300" height="300" src="http://domain.com/profile.jpg"> </div>

Run Code Online (Sandbox Code Playgroud)

python中的正则表达式怎么样？

我试过以下:

i = re.compile('(?P<src>src=[["[^"]+"][\'[^\']+\']])')
i.search(htmldata)

Run Code Online (Sandbox Code Playgroud)

但是我收到了一个错误

Traceback (most recent call last):
File "<input>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

Run Code Online (Sandbox Code Playgroud)

Answer 1

Avi*_*Raj 9

BeautifulSoup解析器是要走的路.

>>> from bs4 import BeautifulSoup
>>> s = '''<div> My profile <img width='300' height='300' src='http://domain.com/profile.jpg'> </div>'''
>>> soup = BeautifulSoup(s, 'html.parser')
>>> img = soup.select('img')
>>> [i['src'] for i in img if  i['src']]
[u'http://domain.com/profile.jpg']
>>>

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，1 月前
查看次数：	1494 次
最近记录：	10 年，1 月前