use*_*374 5 python selenium web-scraping
我是网络抓取和Python新手。我之前写过一个脚本,效果很好。我在这个中做了基本上相同的事情,但运行速度慢得多。这是我的代码:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import selenium
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
import time
start = time.time()
opp = Options()
opp.add_argument('-headless')
browser = webdriver.Firefox(executable_path = "/Users/0581279/Desktop/L&S/Watchlist/geckodriver", options=opp)
browser.delete_all_cookies()
browser.get("https://www.bloomberg.com/quote/MSGFINA:LX")
c = browser.page_source
soup = BeautifulSoup(c, "html.parser")
all = soup.find_all("span", {"class": "fieldValue__2d582aa7"})
price = all[6].text
browser.quit()
print(price)
end = time.time()
print(end-start)
Run Code Online (Sandbox Code Playgroud)
有时,加载单个页面可能需要长达 2 分钟的时间。我也只是在抓取彭博社的网页。任何帮助,将不胜感激 :)
使用requests和BeautifulSoup您可以轻松快速地抓取信息。以下代码用于获取Bloomberg 的MSGFINA:LX的关键统计数据:
import requests
from bs4 import BeautifulSoup
headers = {
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/72.0.3626.119 Safari/537.36',
'DNT': '1'
}
response = requests.get('https://www.bloomberg.com/quote/MSGFINA:LX', headers=headers)
page = BeautifulSoup(response.text, "html.parser")
key_statistics = page.select("div[class^='module keyStatistics'] div[class^='rowListItemWrap']")
for key_statistic in key_statistics:
fieldLabel = key_statistic.select_one("span[class^='fieldLabel']")
fieldValue = key_statistic.select_one("span[class^='fieldValue']")
print("%s: %s" % (fieldLabel.text, fieldValue.text))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8232 次 |
| 最近记录: |