小编Tan*_*man的帖子

BeautifulSoup get_text 返回 NoneType 对象

我正在尝试使用 BeautifulSoup 进行网页抓取,我需要从此网页中提取标题,特别是“更多”标题部分。这是我迄今为止尝试使用的代码。

import requests
from bs4 import BeautifulSoup
from csv import writer

response = requests.get('https://www.cnbc.com/finance/?page=1')

soup = BeautifulSoup(response.text,'html.parser')

posts = soup.find_all(id='pipeline')

for post in posts:
    data = post.find_all('li')
    for entry in data:
        title = entry.find(class_='headline')
        print(title)
Run Code Online (Sandbox Code Playgroud)

运行此代码以以下输出格式为我提供页面中的所有标题:

<div class="headline">
<a class=" " data-nodeid="105372063" href="/2018/08/02/after-apple-rallies-to-1-trillion-even-the-uber-bullish-crowd-on-wal.html">
           {{{*HEADLINE TEXT HERE*}}}
</a> </div>
Run Code Online (Sandbox Code Playgroud)

但是,如果我在上面的代码中获取标题时使用 get_text() 方法,我只会得到前两个标题。

title = entry.find(class_='headline').get_text()
Run Code Online (Sandbox Code Playgroud)

随后出现此错误:

Traceback (most recent call last):
  File "C:\Users\Tanay Roman\Documents\python projects\scrapper.py", line 16, in <module>
    title = entry.find(class_='headline').get_text()
AttributeError: 'NoneType' object has no …
Run Code Online (Sandbox Code Playgroud)

beautifulsoup web-scraping python-3.x

3
推荐指数
1
解决办法
1万
查看次数

标签 统计

beautifulsoup ×1

python-3.x ×1

web-scraping ×1