BeautifulSoup得到href

Question

BeautifulSoup得到href

我有以下汤:

<a href="some_url">next</a>
<span class="class">...</span>

Run Code Online (Sandbox Code Playgroud)

从这里我想提取href, "some_url"

如果我只有一个标签,我可以做到,但这里有两个标签.我也可以得到文字,'next'但这不是我想要的.

此外,是否有一个很好的描述API的例子.我正在使用标准文档,但我正在寻找更有条理的东西.

Answer 1

Mar*_*air 276

您可以使用find_all以下方式查找a具有href属性的每个元素,并打印每个元素:

from BeautifulSoup import BeautifulSoup

html = '''<a href="some_url">next</a>
<span class="class"><a href="another_url">later</a></span>'''

soup = BeautifulSoup(html)

for a in soup.find_all('a', href=True):
    print "Found the URL:", a['href']

Run Code Online (Sandbox Code Playgroud)

输出将是:

Found the URL: some_url
Found the URL: another_url

Run Code Online (Sandbox Code Playgroud)

请注意,如果您使用的是旧版本的BeautifulSoup(版本4之前),则此方法的名称为findAll.在版本4中,BeautifulSoup的方法名称被更改为符合PEP 8,因此您应该使用find_all.

如果你想要所有标签href,你可以省略name参数:

href_tags = soup.find_all(href=True)

Run Code Online (Sandbox Code Playgroud)

@yoshiserry soup.find('a',{'class':'class'})['href'] (6认同)
你可以用"class ="class""获得单个href吗？ (3认同)

归档时间：	14 年，7 月前
查看次数：	254097 次
最近记录：	12 年，3 月前