Selenium 对我来说真的很慢,我的代码有问题吗?

use*_*374 5 python selenium web-scraping

我是网络抓取和Python新手。我之前写过一个脚本,效果很好。我在这个中做了基本上相同的事情,但运行速度慢得多。这是我的代码:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import selenium
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
import time

start = time.time()
opp = Options()
opp.add_argument('-headless')
browser = webdriver.Firefox(executable_path = "/Users/0581279/Desktop/L&S/Watchlist/geckodriver", options=opp)
browser.delete_all_cookies()
browser.get("https://www.bloomberg.com/quote/MSGFINA:LX")

c = browser.page_source
soup = BeautifulSoup(c, "html.parser")
all = soup.find_all("span", {"class": "fieldValue__2d582aa7"})
price = all[6].text
browser.quit()
print(price)
end = time.time()
print(end-start)
Run Code Online (Sandbox Code Playgroud)

有时,加载单个页面可能需要长达 2 分钟的时间。我也只是在抓取彭博社的网页。任何帮助,将不胜感激 :)

Ser*_*ers 3

使用requestsBeautifulSoup您可以轻松快速地抓取信息。以下代码用于获取Bloomberg 的MSGFINA:LX的关键统计数据

import requests
from bs4 import BeautifulSoup

headers = {
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/72.0.3626.119 Safari/537.36',
    'DNT': '1'
}

response = requests.get('https://www.bloomberg.com/quote/MSGFINA:LX', headers=headers)
page = BeautifulSoup(response.text, "html.parser")

key_statistics = page.select("div[class^='module keyStatistics'] div[class^='rowListItemWrap']")
for key_statistic in key_statistics:
    fieldLabel = key_statistic.select_one("span[class^='fieldLabel']")
    fieldValue = key_statistic.select_one("span[class^='fieldValue']")
    print("%s: %s" % (fieldLabel.text, fieldValue.text))
Run Code Online (Sandbox Code Playgroud)