之前我使用httplib
模块在请求中添加标头.现在我正在尝试与requests
模块相同的事情.
这是我正在使用的python请求模块:http: //pypi.python.org/pypi/requests
如何添加标题,request.post
并request.get
说我必须foobar
在标题中的每个请求中添加密钥.
我正在尝试使用以下代码从网页获取 HTML 源代码:
import requests
url = "https://dictionary.cambridge.org/us/dictionary/english-arabic/hi"
r = requests.get(url)
Run Code Online (Sandbox Code Playgroud)
但是,我收到以下错误:
Traceback (most recent call last):
File "/home/username/ak_env/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/home/username/ak_env/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/home/username/ak_env/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.8/http/client.py", line 1347, in getresponse
response.begin()
File "/usr/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.8/http/client.py", line 276, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: …
Run Code Online (Sandbox Code Playgroud) 是否有任何其他优雅的方式为请求添加标头:
import requests
requests.get(url,headers={'Authorization', 'GoogleLogin auth=%s' % authorization_token})
Run Code Online (Sandbox Code Playgroud)
不起作用,而urllib2工作:
import urllib2
request = urllib2.Request('http://maps.google.com/maps/feeds/maps/default/full')
request.add_header('Authorization', 'GoogleLogin auth=%s' % authorization_token)
urllib2.urlopen(request).read()
Run Code Online (Sandbox Code Playgroud) 我正在尝试从谷歌搜索结果中提取链接.Inspect元素告诉我,我感兴趣的部分有"class = r".第一个结果如下:
<h3 class="r" original_target="https://en.wikipedia.org/wiki/chocolate" style="display: inline-block;">
<a href="https://en.wikipedia.org/wiki/Chocolate"
ping="/url?sa=t&source=web&rct=j&url=https://en.wikipedia.org/wiki/Chocolate&ved=0ahUKEwjW6tTC8LXZAhXDjpQKHSXSClIQFgheMAM"
saprocessedanchor="true">
Chocolate - Wikipedia
</a>
</h3>
Run Code Online (Sandbox Code Playgroud)
要提取"href"我做:
import bs4, requests
res = requests.get('https://www.google.com/search?q=chocolate')
googleSoup = bs4.BeautifulSoup(res.text, "html.parser")
elements= googleSoup.select(".r a")
elements[0].get("href")
Run Code Online (Sandbox Code Playgroud)
但我意外得到:
'/url?q=https://en.wikipedia.org/wiki/Chocolate&sa=U&ved=0ahUKEwjHjrmc_7XZAhUME5QKHSOCAW8QFggWMAA&usg=AOvVaw03f1l4EU9fYd'
Run Code Online (Sandbox Code Playgroud)
我想要的地方:
"https://en.wikipedia.org/wiki/Chocolate"
属性"ping"似乎令人困惑.有任何想法吗?