我正在尝试学习如何使用 selenium 和 python ,我想抓取该网站的新闻标题和新闻日期,
但我有一个不知道如何解决的问题。
这是我的代码:
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import time
import json
driver = webdriver.Chrome("./chromedriver")
driver.implicitly_wait(10)
driver.get("https://www.thestandnews.com/search/?q=%E6%96%B0%E5%86%A0%E8%82%BA%E7%82%8E")
soup = BeautifulSoup(driver.page_source, "lxml")
pages_remaining = True
page_num = 1
My_array = []
while pages_remaining:
print("Page Number:", page_num)
soup = BeautifulSoup(driver.page_source, "lxml")
""" #undoned
tags_lis = soup.find_all("li")
for tag in tags_lis:
tag_a = tag.find("a")
tag_span = tag.find("span")
title = tag_a.text
date = tag_span.text
temp = {"title": title , "date": date}
print(temp)
My_array.append(temp)
"""
try:
#Press button of next page
#next_link =driver.find_element_by_xpath()
nextPg = '//*[@id="___gcse_1"]/div/div/div/div[5]/div[2]/div/div/div[2]/div/div[%d]' % (page_num + 1)
print(nextPg)
next_link = driver.find_element_by_xpath(nextPg)
next_link.click()
time.sleep(5)
if page_num < 10:
page_num = page_num + 1
else:
pages_remaining = False
except Exception:
pages_remaining = False
driver.close()
Run Code Online (Sandbox Code Playgroud)
这是错误信息,任何人都可以给出提示,谢谢!
DevTools listening on ws://127.0.0.1:49952/devtools/browser/749fcb19-d13a-4f38-9d7c-3da58726e10a
[13744:13732:0517/214816.873:ERROR:browser_switcher_service.cc(238)] XXX Init()
Page Number: 1
//*[@id="___gcse_1"]/div/div/div/div[5]/div[2]/div/div/div[2]/div/div[2]
[13744:13732:0517/214824.321:ERROR:device_event_log_impl.cc(162)] [21:48:24.321] Bluetooth:
bluetooth_adapter_winrt.cc:1055 Getting Default Adapter failed.
Page Number: 2
//*[@id="___gcse_1"]/div/div/div/div[5]/div[2]/div/div/div[2]/div/div[3]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4258 次 |
| 最近记录: |