Python请求模块在获取请求期间不返回完整页面

Question

Python请求模块在获取请求期间不返回完整页面

And*_*ton 7 python web-scraping python-requests

当我向这个 url 发出 get 请求时：http : //www.waterwaysguide.org.au/waterwaysguide/access-point/4980/partial使用浏览器返回一个完整的 html 页面。但是，当我使用 python requests 模块发出 GET 请求时，只返回了 html 的一部分，并且缺少核心内容。

如何更改我的代码以便我可以获得丢失的数据？

这是我正在使用的代码；

import requests
def get_data(point_num):
    base_url = 'http://www.waterwaysguide.org.au/waterwaysguide/access-point/{}/partial'
    r = requests.get(base_url)
    html_content = r.text
    print(html_content)
get_data(4980)

Run Code Online (Sandbox Code Playgroud)

运行代码的结果如下所示。里面内容DIV CLASS =“查看浏览水路访问点页...缺失。

import requests
def get_data(point_num):
    base_url = 'http://www.waterwaysguide.org.au/waterwaysguide/access-point/{}/partial'
    r = requests.get(base_url)
    html_content = r.text
    print(html_content)
get_data(4980)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ash*_*man 5

以下方法显示 div class="view view-waterway-access-point-page...

>>> from urllib.request import Request, urlopen
>>> from bs4 import BeautifulSoup
>>> url = 'http://www.waterwaysguide.org.au/waterwaysguide/access-
point/4980/partial'
>>> req = Request(url,headers={'User-Agent': 'Mozilla/5.0'})
>>> webpage = urlopen(req).read()
>>> print(webpage)

Run Code Online (Sandbox Code Playgroud)

Answer 2

Anu*_*tam 0

可能会出现这样的情况：页面加载后使用 javascript 呈现元素。因此，您只能获得页面，而不能获得 javascript 渲染的部分。
您可能想查看

https://medium.com/@hoppy/how-to-test-or-scrape-javascript-rendered-websites-with-python-selenium-a-beginner-step-by-c137892216aa

Web-使用 Python 抓取 JavaScript 页面

归档时间：	8 年前
查看次数：	16309 次
最近记录：	4 年，11 月前