如果找到某些字符串,则提取链接和文本 - BeautifulSoup

Pal*_*roe 1 python beautifulsoup web-scraping

我正在尝试运行 beautifulSoup 从网站中提取链接和文本(我已获得许可)

\n\n

我运行以下代码来获取链接和文本:

\n\n
import requests\nfrom bs4 import BeautifulSoup \n\nurl = "http://implementconsultinggroup.com/career/#/6257"\nr = requests.get(url)\n\nsoup = BeautifulSoup(r.content)\n\nlinks = soup.find_all("a")\n\nfor link in links:\n     if "career" in link.get("href"):\n             print "<a href=\'%s\'>%s</a>" %(link.get("href"), link.text)\n
Run Code Online (Sandbox Code Playgroud)\n\n

这给了我以下输出:

\n\n
View Position\n\n</a>\n<a href=\'/career/business-analyst-within-human-capital-management/\'>\nBusiness analyst within human capital management\nCOPENHAGEN \xe2\x80\xa2 We are looking for an ambitious student with an interest in HR \nwho is passionate about working in the cross-field of people management, \nbusiness and technology\n\n\n\n\nView Position\n\n</a>\n<a href=\'/career/management-consultants-within-strategic-workforce-planning/\'>\nManagement consultants within strategic workforce planning\nCOPENHAGEN \xe2\x80\xa2 We are looking for consultants with profound experience from \nother consultancies\n\n\n\n\nView Position\n\n</a>\n<a href=\'/career/management-consultants-within-supply-chain-strategy-\nproduction-and-process-management/\'>\nManagement consultants within supply chain strategy, production and process \nmanagement\nMALM\xc3\x96 \xe2\x80\xa2 We are looking for talented graduates who want a career in management \nconsulting\n
Run Code Online (Sandbox Code Playgroud)\n\n

这几乎是正确的,但是我只希望在文本中具有名称 COPENHAGEN 的位置时返回这些位置(即不应返回 MALMO 位置之上)。

\n\n

该网站的 HTML 代码如下所示:

\n\n
<div class="small-12 medium-9 columns top-lined">\n                                    <a href="/career/management-consultants-within-supply-chain-management/" class="box-link">\n                                    <h2 class="article__title--tiny" data-searchable-text="">Management consultants within supply chain management</h2>\n                                    <p class="article__longDescription" data-searchable-text="">COPENHAGEN \xe2\x80\xa2 We are looking for bright graduates with a passion for supply chain management and supply chain planning for our planning and execution excellence team.</p>\n                                    <div class="styled-link styled-icon">\n                                        <span class="icon icon-icon">\n                                            <i class="fa fa-chevron-right"></i>\n                                        </span>\n                                        <span class="icon-text">View Position</span>\n                                    </div>\n                                </a>\n                            </div>\n
Run Code Online (Sandbox Code Playgroud)\n

chi*_*cio 5

看来你可以添加另一个条件:

(...)
for link in links:
    if "career" in link.get("href") and 'COPENHAGEN' in link.text:
       print "<a href='%s'>%s</a>" %(link.get("href"), link.text)
Run Code Online (Sandbox Code Playgroud)