JJJ*_*ohn 5 python beautifulsoup web-scraping
我正在尝试抓取 Bing dict 页面https://cn.bing.com/dict/search?q=avengers
这是代码
import requests
from bs4 import BeautifulSoup
url = "https://cn.bing.com/dict/search?q=avengers"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")
examples = soup.find_all("div", class_="sen_en b_regtxt")
for example in examples:
print(example.text.strip())
Run Code Online (Sandbox Code Playgroud)
特别是,我正在尝试抓取该页面上的所有例句,这些句子包含在divwith 类中sen_en b_regtxt
然而,response.content其中连一个例句都没有,我错过了什么?
PS,访问该页面无需登录
在 @Artur Chukhrai 的帮助下,我也尝试使用硒,但得到了“没有找到复仇者的结果”
但是,如果我先访问网址“cn.bing.com/dict”,然后将关键字放入搜索框中,我就会得到结果页面。
小智 3
对Arthur Chukhrai的答案进行一个小修改即可,加载https://cn.bing.com/dict,然后在搜索框中写入文本:
from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time
url = "https://cn.bing.com/dict/"
# Start a new Selenium web driver instance
driver = webdriver.Chrome()
driver.get(url)
# Wait for the page to load
time.sleep(5)
# Write text in search box
search_box = driver.find_element(By.CLASS_NAME, value="b_searchbox")
search_box.send_keys("avengers\n")
# Wait for the page to load
time.sleep(5)
# Get the page source after it has fully loaded
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
# Find and print the examples of the word
examples = soup.select(".sen_en")
for example in examples:
print(example.text.strip())
# Quit the web driver instance
driver.quit()
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
425 次 |
| 最近记录: |