我正在尝试使用 BeautifulSoup 进行网页抓取,我需要从此网页中提取标题,特别是“更多”标题部分。这是我迄今为止尝试使用的代码。
import requests
from bs4 import BeautifulSoup
from csv import writer
response = requests.get('https://www.cnbc.com/finance/?page=1')
soup = BeautifulSoup(response.text,'html.parser')
posts = soup.find_all(id='pipeline')
for post in posts:
data = post.find_all('li')
for entry in data:
title = entry.find(class_='headline')
print(title)
Run Code Online (Sandbox Code Playgroud)
运行此代码以以下输出格式为我提供页面中的所有标题:
<div class="headline">
<a class=" " data-nodeid="105372063" href="/2018/08/02/after-apple-rallies-to-1-trillion-even-the-uber-bullish-crowd-on-wal.html">
{{{*HEADLINE TEXT HERE*}}}
</a> </div>
Run Code Online (Sandbox Code Playgroud)
但是,如果我在上面的代码中获取标题时使用 get_text() 方法,我只会得到前两个标题。
title = entry.find(class_='headline').get_text()
Run Code Online (Sandbox Code Playgroud)
随后出现此错误:
Traceback (most recent call last):
File "C:\Users\Tanay Roman\Documents\python projects\scrapper.py", line 16, in <module>
title = entry.find(class_='headline').get_text()
AttributeError: 'NoneType' object has no …Run Code Online (Sandbox Code Playgroud)