我正在尝试从 Python 的 API 中获取 glassdoor 数据:
import urllib2
id1 = 'x'
key = 'y'
action = 'employers'
company = 'company'
basepath = 'http://api.glassdoor.com/api/api.htm?v=1&format=json&t.p='
url = basepath + id1 + '&t.k=' + key + '&action=' + action + '&q=' + company + '&userip=192.168.43.42&useragent=Mozilla/5.0'
response = urllib2.urlopen(url)
html = response.read()
Run Code Online (Sandbox Code Playgroud)
我收到以下错误:
>>> response = urllib2.urlopen(url)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "//anaconda/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "//anaconda/lib/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
File "//anaconda/lib/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "//anaconda/lib/python2.7/urllib2.py", line 475, in error
return self._call_chain(*args)
File "//anaconda/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "//anaconda/lib/python2.7/urllib2.py", line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
Run Code Online (Sandbox Code Playgroud)
有没有人可以帮忙...?
谢谢
下面是通过添加 BeautifulSoup 模块并在变量中设置 User-Agent 进行一些改进的工作代码hdr。
import urllib2, sys
from BeautifulSoup import BeautifulSoup
url = "http://api.glassdoor.com/api/api.htm?t.p=yourID&t.k=yourkey&userip=8.28.178.133&useragent=Mozilla&format=json&v=1&action=employers&q="
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(url,headers=hdr)
response = urllib2.urlopen(req)
soup = BeautifulSoup(response)
Run Code Online (Sandbox Code Playgroud)
希望有帮助,谢谢