Pal*_*roe 1 python beautifulsoup web-scraping
我正在尝试运行 beautifulSoup 从网站中提取链接和文本(我已获得许可)
\n\n我运行以下代码来获取链接和文本:
\n\nimport requests\nfrom bs4 import BeautifulSoup \n\nurl = "http://implementconsultinggroup.com/career/#/6257"\nr = requests.get(url)\n\nsoup = BeautifulSoup(r.content)\n\nlinks = soup.find_all("a")\n\nfor link in links:\n if "career" in link.get("href"):\n print "<a href=\'%s\'>%s</a>" %(link.get("href"), link.text)\nRun Code Online (Sandbox Code Playgroud)\n\n这给了我以下输出:
\n\nView Position\n\n</a>\n<a href=\'/career/business-analyst-within-human-capital-management/\'>\nBusiness analyst within human capital management\nCOPENHAGEN \xe2\x80\xa2 We are looking for an ambitious student with an interest in HR \nwho is passionate about working in the cross-field of people management, \nbusiness and technology\n\n\n\n\nView Position\n\n</a>\n<a href=\'/career/management-consultants-within-strategic-workforce-planning/\'>\nManagement consultants within strategic workforce planning\nCOPENHAGEN \xe2\x80\xa2 We are looking for consultants with profound experience from \nother consultancies\n\n\n\n\nView Position\n\n</a>\n<a href=\'/career/management-consultants-within-supply-chain-strategy-\nproduction-and-process-management/\'>\nManagement consultants within supply chain strategy, production and process \nmanagement\nMALM\xc3\x96 \xe2\x80\xa2 We are looking for talented graduates who want a career in management \nconsulting\nRun Code Online (Sandbox Code Playgroud)\n\n这几乎是正确的,但是我只希望在文本中具有名称 COPENHAGEN 的位置时返回这些位置(即不应返回 MALMO 位置之上)。
\n\n该网站的 HTML 代码如下所示:
\n\n<div class="small-12 medium-9 columns top-lined">\n <a href="/career/management-consultants-within-supply-chain-management/" class="box-link">\n <h2 class="article__title--tiny" data-searchable-text="">Management consultants within supply chain management</h2>\n <p class="article__longDescription" data-searchable-text="">COPENHAGEN \xe2\x80\xa2 We are looking for bright graduates with a passion for supply chain management and supply chain planning for our planning and execution excellence team.</p>\n <div class="styled-link styled-icon">\n <span class="icon icon-icon">\n <i class="fa fa-chevron-right"></i>\n </span>\n <span class="icon-text">View Position</span>\n </div>\n </a>\n </div>\nRun Code Online (Sandbox Code Playgroud)\n
看来你可以添加另一个条件:
(...)
for link in links:
if "career" in link.get("href") and 'COPENHAGEN' in link.text:
print "<a href='%s'>%s</a>" %(link.get("href"), link.text)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6797 次 |
| 最近记录: |