小编MCU*_*vil的帖子

使用请求抓取网页不会返回所有数据

我正在使用 python requests 包来抓取网页。这是代码:

import requests
from bs4 import BeautifulSoup

# Configure Settings
url = "https://mangaabyss.com/read/"
comic = "the-god-of-pro-wrestling"

# Run Scraper
page = requests.get(url + comic + "/")

soup = BeautifulSoup(page.content, 'html.parser')

print(soup.prettify())
Run Code Online (Sandbox Code Playgroud)

它使用的网址是“https://mangaabyss.com/read/the-god-of-pro-wrestling/”,但在 soup 的输出中,我只得到第一个 div,而没有得到其中的其他子元素。这是我得到的输出:

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <link href="/favicon.ico" rel="icon"/>
  <meta content="width=device-width,initial-scale=1,minimum-scale=1,maximum-scale=1,viewport-fit=cover" name="viewport"/>
  <meta content="#250339" name="theme-color"/>
  <title>
   MANGA ABYSS
  </title>
  <script crossorigin="" src="/assets/index.f4dc01fb.js" type="module">
  </script>
  <link href="/assets/index.9b4eb8b4.css" rel="stylesheet"/>
 </head>
 <body>
  <div id="manga-mobile-app">
  </div>
 </body>
</html>
Run Code Online (Sandbox Code Playgroud)

我想要抓取的内容位于该 div 的深处,我希望提取章节数。这是它的选择器:

#manga-mobile-app > div > …
Run Code Online (Sandbox Code Playgroud)

html python beautifulsoup web-scraping python-requests

3
推荐指数
1
解决办法
606
查看次数