Kam*_*ish 2 urllib beautifulsoup http-status-code-403 python-3.x
import urllib.request
import urllib
from bs4 import BeautifulSoup
url = "https://www.brightscope.com/ratings"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "html.parser")
print(soup.title)
Run Code Online (Sandbox Code Playgroud)
我试图去上述网站,代码不断吐出403禁止错误。
有任何想法吗?
C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ python.exe“ C:/ Users / jerem / PycharmProjects / webscraper / url scraper.py”追溯(最近一次调用):文件“ C :/ Users / jerem / PycharmProjects / webscraper / url scraper.py”,第7行,页面= urllib.request.urlopen(url)文件“ C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py“,第163行,在urlopen中返回opener.open(URL,数据,超时)文件“ C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python \ Python35-32 \ lib \ urllib \ request.py“,第472行,打开的响应= meth(req,response)文件“ C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py”,第582,在http_response'http'中,请求,响应,代码,msg,hdr)文件“ C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py“,第510行,错误返回self._call_chain(* args)文件” C:\ Users \ jerem \ AppData \ _call_chain中的Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py“行444 = func(* args)文件” C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35- 32 \ lib \ urllib \ request.py“,第590行,位于http_error_default中,引发HTTPError(req.full_url,code,msg,hdrs,fp)urllib.error.HTTPError:HTTP错误403:禁止\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py“,第590行,位于http_error_default中,引发HTTPError(req.full_url,code,msg,hdrs,fp)urllib.error。 HTTPError:HTTP错误403:禁止\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py“,第590行,位于http_error_default中,引发HTTPError(req.full_url,code,msg,hdrs,fp)urllib.error。 HTTPError:HTTP错误403:禁止
import requests
from bs4 import BeautifulSoup
url = "https://www.brightscope.com/ratings"
headers = {'User-Agent':'Mozilla/5.0'}
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")
print(soup.title)
Run Code Online (Sandbox Code Playgroud)
出:
<title>BrightScope Ratings</title>
Run Code Online (Sandbox Code Playgroud)
首先,使用requests而不是urllib。
比,添加headers到requests,如果没有,则该网站将禁止您,因为默认User-Agent值为搜寻器,该网站不喜欢它。
| 归档时间: |
|
| 查看次数: |
7062 次 |
| 最近记录: |