Tan*_*man 3 beautifulsoup web-scraping python-3.x
我正在尝试使用 BeautifulSoup 进行网页抓取,我需要从此网页中提取标题,特别是“更多”标题部分。这是我迄今为止尝试使用的代码。
import requests
from bs4 import BeautifulSoup
from csv import writer
response = requests.get('https://www.cnbc.com/finance/?page=1')
soup = BeautifulSoup(response.text,'html.parser')
posts = soup.find_all(id='pipeline')
for post in posts:
data = post.find_all('li')
for entry in data:
title = entry.find(class_='headline')
print(title)
Run Code Online (Sandbox Code Playgroud)
运行此代码以以下输出格式为我提供页面中的所有标题:
<div class="headline">
<a class=" " data-nodeid="105372063" href="/2018/08/02/after-apple-rallies-to-1-trillion-even-the-uber-bullish-crowd-on-wal.html">
{{{*HEADLINE TEXT HERE*}}}
</a> </div>
Run Code Online (Sandbox Code Playgroud)
但是,如果我在上面的代码中获取标题时使用 get_text() 方法,我只会得到前两个标题。
title = entry.find(class_='headline').get_text()
Run Code Online (Sandbox Code Playgroud)
随后出现此错误:
Traceback (most recent call last):
File "C:\Users\Tanay Roman\Documents\python projects\scrapper.py", line 16, in <module>
title = entry.find(class_='headline').get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'
Run Code Online (Sandbox Code Playgroud)
为什么添加 get_text() 方法只返回部分结果。我该如何解决?
您误解了错误消息。不是.get_text()调用返回一个NoneType对象,而是类型的对象NoneType没有那个方法。
只有一个类型的对象NoneType,即值None。在这里它被返回,entry.find(class_='headline')因为它找不到entry与搜索条件匹配的元素。换句话说,对于该entry元素,没有具有 class 的子元素headline。
有两个这样的<li>元素,一个带有 id nativedvriver3,另一个带有nativedvriver9,并且两者都会出现该错误。您需要先检查是否有匹配的元素:
for entry in data:
headline = entry.find(class_='headline')
if headline is not None:
title = headline.get_text()
Run Code Online (Sandbox Code Playgroud)
如果您使用CSS 选择器,您会更轻松:
headlines = soup.select('#pipeline li .headline')
for headline in headlines:
headline_text = headline.get_text(strip=True)
print(headline_text)
Run Code Online (Sandbox Code Playgroud)
这产生:
>>> headlines = soup.select('#pipeline li .headline')
>>> for headline in headlines:
... headline_text = headline.get_text(strip=True)
... print(headline_text)
...
Hedge funds fight back against tech in the war for talent
Goldman Sachs sees more price pain ahead for bitcoin
Dish Network shares rise 15% after subscriber losses are less than expected
Bitcoin whale makes ‘enormous’ losing bet, so now other traders have to foot the bill
The 'Netflix of fitness' looks to become a publicly traded stock as soon as next year
Amazon slammed for ‘insult’ tax bill in the UK despite record profits
Nasdaq could plunge 15 percent or more as ‘rolling bear market’ grips stocks: Morgan Stanley
Take-Two shares surge 9% after gamemaker beats expectations due to 'Grand Theft Auto Online'
UK bank RBS announces first dividend in 10 years
Michael Cohen reportedly secured a $10 million deal with Trump donor to advance a nuclear project
After-hours buzz: GPRO, AIG & more
Bitcoin is still too 'unstable' to become mainstream money, UBS says
Apple just hit a trillion but its stock performance has been dwarfed by the other tech giants
The first company to ever reach $1 trillion in market value was in China and got crushed
Apple at a trillion-dollar valuation isn’t crazy like the dot-com bubble
After Apple rallies to $1 trillion, even the uber bullish crowd on Wall Street believes it may need to cool off
Run Code Online (Sandbox Code Playgroud)