Beautifulsoup - 如何获取具有特定类的块内的所有链接?

Ham*_*ama 5 python beautifulsoup python-2.7

我有以下 HTML Dom:

    <div class="meta-info meta-info-wide"> <div class="title">???????????</div> <div class="content contains-text-link"> 

<a class="dev-link" href="http://www.jourist.com&amp;sa=D&amp;usg=AFQjCNHiC-nLYHAJwNnvDyYhyoeB6n8YKg" rel="nofollow" target="_blank">??????? ?? ???-????</a>

 <a class="dev-link" href="mailto:info@jourist.com" rel="nofollow" target="_blank">????????: info@jourist.com</a> 

 <div class="content physical-address">Diagonalstraße 41
    20537 Hamburg</div> </div> </div>
Run Code Online (Sandbox Code Playgroud)

我需要使用dev-linkblock 内的类获取所有链接(url)div.meta-info-wide

我尝试了这种明显的方式,但不起作用:

divTag = soup.find_all("div", {"class":"meta-info-wide"})
        print(len(divTag))

        for tag in divTag:
            tdTags = tag.find_all("a", {"class":"dev-link"})
            for tag in tdTags:
                print tag.text
Run Code Online (Sandbox Code Playgroud)

Mar*_*ans 4

请尝试以下操作:

\n\n
import bs4\n\nhtml = """    \n<div class="meta-info meta-info-wide"> <div class="title">\xd0\xa0\xd0\xb0\xd0\xb7\xd1\x80\xd0\xb0\xd0\xb1\xd0\xbe\xd1\x82\xd1\x87\xd0\xb8\xd0\xba</div> <div class="content contains-text-link"> \n<a class="dev-link" href="http://www.jourist.com&amp;sa=D&amp;usg=AFQjCNHiC-nLYHAJwNnvDyYhyoeB6n8YKg" rel="nofollow" target="_blank">\xd0\x9f\xd0\xb5\xd1\x80\xd0\xb5\xd0\xb9\xd1\x82\xd0\xb8 \xd0\xbd\xd0\xb0 \xd0\xb2\xd0\xb5\xd0\xb1-\xd1\x81\xd0\xb0\xd0\xb9\xd1\x82</a>\n<a class="dev-link" href="mailto:info@jourist.com" rel="nofollow" target="_blank">\xd0\x9d\xd0\xb0\xd0\xbf\xd0\xb8\xd1\x81\xd0\xb0\xd1\x82\xd1\x8c: info@jourist.com</a> \n<div class="content physical-address">Diagonalstra\xc3\x9fe 4120537 Hamburg</div> </div> </div>"""\n\nsoup = bs4.BeautifulSoup(html, "html.parser")\n\nfor div in soup.find_all("div", {"class":"meta-info-wide"}):\n    for link in div.select("a.dev-link"):\n        print link[\'href\']\n
Run Code Online (Sandbox Code Playgroud)\n\n

这给你:

\n\n
http://www.jourist.com&sa=D&usg=AFQjCNHiC-nLYHAJwNnvDyYhyoeB6n8YKg\nmailto:info@jourist.com \n
Run Code Online (Sandbox Code Playgroud)\n\n

用于select()返回所有a具有 class 的标签dev-link。当涉及两个或多个 CSS 类时,建议使用此方法。

\n\n

使用 BeautifulSoup 4.5.1、Python 2.7.12 进行测试

\n