我正在尝试使用 BeautifulSoup 抓取 Bing 字典页面。但是,response.content不包含实际数据,我该怎么办?

JJJ*_*ohn 5 python beautifulsoup web-scraping

我正在尝试抓取 Bing dict 页面https://cn.bing.com/dict/search?q=avengers

这是代码

import requests
from bs4 import BeautifulSoup
    
url = "https://cn.bing.com/dict/search?q=avengers"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")

examples = soup.find_all("div", class_="sen_en b_regtxt")

for example in examples:
    print(example.text.strip())
Run Code Online (Sandbox Code Playgroud)

特别是,我正在尝试抓取该页面上的所有例句,这些句子包含在divwith 类中sen_en b_regtxt

然而,response.content其中连一个例句都没有,我错过了什么?

PS,访问该页面无需登录

在此输入图像描述

在 @Artur Chukhrai 的帮助下,我也尝试使用硒,但得到了“没有找到复仇者的结果”

在此输入图像描述

但是,如果我先访问网址“cn.bing.com/dict”,然后将关键字放入搜索框中,我就会得到结果页面。

小智 3

对Arthur Chukhrai的答案进行一个小修改即可,加载https://cn.bing.com/dict,然后在搜索框中写入文本:

from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time

url = "https://cn.bing.com/dict/"

# Start a new Selenium web driver instance
driver = webdriver.Chrome()
driver.get(url)

# Wait for the page to load
time.sleep(5)

# Write text in search box
search_box = driver.find_element(By.CLASS_NAME, value="b_searchbox")
search_box.send_keys("avengers\n")

# Wait for the page to load
time.sleep(5)

# Get the page source after it has fully loaded
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")

# Find and print the examples of the word
examples = soup.select(".sen_en")
for example in examples:
    print(example.text.strip())

# Quit the web driver instance
driver.quit()

Run Code Online (Sandbox Code Playgroud)