正则表达式,找到<a>标签中的所有"href"

Question

正则表达式,找到<a>标签中的所有"href"

我有一个正则表达式,在标签中搜索"href"属性,但它目前效果不佳:

<a[^>]* href="([^"]*)"

Run Code Online (Sandbox Code Playgroud)

它从中发现:

<a href="http://something" title="Development of the Python language and website">Core Development</a>

Run Code Online (Sandbox Code Playgroud)

这一行:

<a href="http://something"

Run Code Online (Sandbox Code Playgroud)

但我只需要找到:

http://something

Run Code Online (Sandbox Code Playgroud)

Answer 1

hwn*_*wnd 7

这似乎对我有用吗？您可以自己查看工作演示.

matches = re.findall(r'<a[^>]* href="([^"]*)"', html)

Run Code Online (Sandbox Code Playgroud)

相反,我会用美丽的汤来实现这一目标......

from bs4 import BeautifulSoup

html = '''
<a href="http://something" title="Development of the Python language and website">Core Development</a>
<a href="http://something.com" title="Development of the Python language and website">Core Development</a>
'''

soup = BeautifulSoup(html)

for a in soup.find_all('a', href=True):
    print a['href']

Run Code Online (Sandbox Code Playgroud)

注意:如果您使用的是旧版的Beautiful Soup,那么您将使用以下代码:

for a in soup.findAll('a', href=True):

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，11 月前
查看次数：	3975 次
最近记录：	8 年，6 月前