Python 3.5 urllib.request 403禁止错误

Question

Python 3.5 urllib.request 403禁止错误

Kam*_*ish 2 urllib beautifulsoup http-status-code-403 python-3.x

import urllib.request
import urllib
from bs4 import BeautifulSoup


url = "https://www.brightscope.com/ratings"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "html.parser")

print(soup.title)

Run Code Online (Sandbox Code Playgroud)

我试图去上述网站，代码不断吐出403禁止错误。

有任何想法吗？

C：\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ python.exe“ C：/ Users / jerem / PycharmProjects / webscraper / url scraper.py”追溯（最近一次调用）：文件“ C ：/ Users / jerem / PycharmProjects / webscraper / url scraper.py”，第7行，页面= urllib.request.urlopen（url）文件“ C：\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py“，第163行，在urlopen中返回opener.open（URL，数据，超时）文件“ C：\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python \ Python35-32 \ lib \ urllib \ request.py“，第472行，打开的响应= meth（req，response）文件“ C：\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py”，第582，在http_response'http'中，请求，响应，代码，msg，hdr）文件“ C：\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py“，第510行，错误返回self._call_chain（* args）文件” C：\ Users \ jerem \ AppData \ _call_chain中的Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py“行444 = func（* args）文件” C：\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35- 32 \ lib \ urllib \ request.py“，第590行，位于http_error_default中，引发HTTPError（req.full_url，code，msg，hdrs，fp）urllib.error.HTTPError：HTTP错误403：禁止\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py“，第590行，位于http_error_default中，引发HTTPError（req.full_url，code，msg，hdrs，fp）urllib.error。 HTTPError：HTTP错误403：禁止\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py“，第590行，位于http_error_default中，引发HTTPError（req.full_url，code，msg，hdrs，fp）urllib.error。 HTTPError：HTTP错误403：禁止

Answer 1

宏杰李*_*宏杰李 5

import requests
from bs4 import BeautifulSoup


url = "https://www.brightscope.com/ratings"
headers = {'User-Agent':'Mozilla/5.0'}
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")

print(soup.title)

Run Code Online (Sandbox Code Playgroud)

出：

<title>BrightScope Ratings</title>

Run Code Online (Sandbox Code Playgroud)

首先，使用requests而不是urllib。

比，添加headers到requests，如果没有，则该网站将禁止您，因为默认User-Agent值为搜寻器，该网站不喜欢它。

归档时间：	8 年，11 月前
查看次数：	7062 次
最近记录：	6 年，9 月前