我正在尝试使用 BeautifulSoup 抓取 Bing 字典页面。但是，response.content不包含实际数据，我该怎么办？

Question

我正在尝试使用 BeautifulSoup 抓取 Bing 字典页面。但是，response.content不包含实际数据，我该怎么办？

JJJ*_*ohn 5 python beautifulsoup web-scraping

我正在尝试抓取 Bing dict 页面https://cn.bing.com/dict/search?q=avengers

这是代码

import requests
from bs4 import BeautifulSoup
    
url = "https://cn.bing.com/dict/search?q=avengers"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")

examples = soup.find_all("div", class_="sen_en b_regtxt")

for example in examples:
    print(example.text.strip())

Run Code Online (Sandbox Code Playgroud)

特别是，我正在尝试抓取该页面上的所有例句，这些句子包含在divwith 类中sen_en b_regtxt

然而，response.content其中连一个例句都没有，我错过了什么？

PS，访问该页面无需登录

在 @Artur Chukhrai 的帮助下，我也尝试使用硒，但得到了“没有找到复仇者的结果”

但是，如果我先访问网址“cn.bing.com/dict”，然后将关键字放入搜索框中，我就会得到结果页面。

Answer 1

小智 3

对Arthur Chukhrai的答案进行一个小修改即可，加载https://cn.bing.com/dict，然后在搜索框中写入文本：

from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time

url = "https://cn.bing.com/dict/"

# Start a new Selenium web driver instance
driver = webdriver.Chrome()
driver.get(url)

# Wait for the page to load
time.sleep(5)

# Write text in search box
search_box = driver.find_element(By.CLASS_NAME, value="b_searchbox")
search_box.send_keys("avengers\n")

# Wait for the page to load
time.sleep(5)

# Get the page source after it has fully loaded
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")

# Find and print the examples of the word
examples = soup.select(".sen_en")
for example in examples:
    print(example.text.strip())

# Quit the web driver instance
driver.quit()

Run Code Online (Sandbox Code Playgroud)

归档时间：	2 年，8 月前
查看次数：	425 次
最近记录：	2 年，7 月前