trb*_*bck 128 python asynchronous httprequest python-requests
我尝试了python请求库文档中提供的示例:
http://docs.python-requests.org/en/latest/user/advanced/#asynchronous-requests
与async.map(rs)我得到的响应代码,但我想请求每一页的内容.
out = async.map(rs)
print out[0].content
Run Code Online (Sandbox Code Playgroud)
例如,只是不工作.
Jef*_*eff 143
下面的答案是不适用于请求v0.13.0 +.在编写此问题后,异步功能已移至grequests.但是,你可以requests用grequests下面的替换,它应该工作.
我留下这个答案是为了反映原始问题是关于使用请求<v0.13.0.
要以async.map 异步方式执行多个任务,您必须:
async.map所有请求/操作的列表例:
from requests import async
# If using requests > v0.13.0, use
# from grequests import async
urls = [
'http://python-requests.org',
'http://httpbin.org',
'http://python-guide.org',
'http://kennethreitz.com'
]
# A simple task to do to each response object
def do_something(response):
print response.url
# A list to hold our things to do via async
async_list = []
for u in urls:
# The "hooks = {..." part is where you define what you want to do
#
# Note the lack of parentheses following do_something, this is
# because the response will be used as the first argument automatically
action_item = async.get(u, hooks = {'response' : do_something})
# Add the task to our list of things to do via async
async_list.append(action_item)
# Do our list of things to do via async
async.map(async_list)
Run Code Online (Sandbox Code Playgroud)
out*_*ile 75
async现在是一个独立的模块:grequests.
请参见:https://github.com/kennethreitz/grequests
$ pip install grequests
Run Code Online (Sandbox Code Playgroud)
构建一个堆栈:
import grequests
urls = [
'http://www.heroku.com',
'http://tablib.org',
'http://httpbin.org',
'http://python-requests.org',
'http://kennethreitz.com'
]
rs = (grequests.get(u) for u in urls)
Run Code Online (Sandbox Code Playgroud)
发送堆栈
grequests.map(rs)
Run Code Online (Sandbox Code Playgroud)
结果看起来像
[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]
Run Code Online (Sandbox Code Playgroud)
grequests似乎没有为并发请求设置限制,即当多个请求被发送到同一服务器时.
Ant*_*lin 44
我测试了请求 - 期货和问候.Grequests更快,但带来了猴子修补和依赖项的其他问题.请求 - 期货比问候慢几倍.我决定编写自己的简单包装请求到ThreadPollExecutor,它几乎和grequest一样快,但没有外部依赖.
import requests
import concurrent.futures
def get_urls():
return ["url1","url2"]
def load_url(url, timeout):
return requests.get(url, timeout = timeout)
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
future_to_url = {executor.submit(load_url, url, 10): url for url in get_urls()}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
resp_err = resp_err + 1
else:
resp_ok = resp_ok + 1
Run Code Online (Sandbox Code Playgroud)
Dre*_*puf 28
也许请求 - 期货是另一种选择.
from requests_futures.sessions import FuturesSession
session = FuturesSession()
# first request is started in background
future_one = session.get('http://httpbin.org/get')
# second requests is started immediately
future_two = session.get('http://httpbin.org/get?foo=bar')
# wait for the first request to complete, if it hasn't already
response_one = future_one.result()
print('response one status: {0}'.format(response_one.status_code))
print(response_one.content)
# wait for the second request to complete, if it hasn't already
response_two = future_two.result()
print('response two status: {0}'.format(response_two.status_code))
print(response_two.content)
Run Code Online (Sandbox Code Playgroud)
办公室文件中也建议使用它.如果你不想参与gevent,这是一个很好的.
Har*_*oli 14
我对发布的大多数答案都有很多问题 - 他们要么使用已过时且功能有限的库,要么提供在执行请求时具有太多魔力的解决方案,从而难以处理错误。如果它们不属于上述类别之一,则它们是第 3 方库或已弃用。
一些解决方案纯粹在 http 请求中工作得很好,但这些解决方案无法满足任何其他类型的请求,这很荒谬。这里不需要高度定制的解决方案。
简单地使用 python 内置库asyncio就足以执行任何类型的异步请求,并为复杂和用例特定的错误处理提供足够的流动性。
import asyncio
loop = asyncio.get_event_loop()
def do_thing(params):
async def get_rpc_info_and_do_chores(id):
# do things
response = perform_grpc_call(id)
do_chores(response)
async def get_httpapi_info_and_do_chores(id):
# do things
response = requests.get(URL)
do_chores(response)
async_tasks = []
for element in list(params.list_of_things):
async_tasks.append(loop.create_task(get_chan_info_and_do_chores(id)))
async_tasks.append(loop.create_task(get_httpapi_info_and_do_chores(ch_id)))
loop.run_until_complete(asyncio.gather(*async_tasks))
Run Code Online (Sandbox Code Playgroud)
它的工作原理很简单。您正在创建一系列希望异步执行的任务,然后要求循环执行这些任务并在完成后退出。没有因缺乏维护而导致的额外库,也没有所需的功能。
Dra*_*obZ 13
不幸的是,据我所知,请求库不具备执行异步请求的能力。您可以将async/await语法包裹在 周围requests,但这将使底层请求的同步性不会降低。如果您想要真正的异步请求,则必须使用提供它的其他工具。一种这样的解决方案是aiohttp(Python 3.5.3+)。根据我在 Python 3.7async/await语法中使用它的经验,它运行良好。下面我写了三个使用执行n个web请求的实现
sync_requests_get_all使用 Pythonrequests库的纯同步请求 ( )async_requests_get_all使用requestsPython 3.7async/await语法包装的 Python库的同步请求 ( )和asyncioasync_aiohttp_get_all的 Pythonaiohttp库的真正异步实现 ( )async/await和asyncioimport time
import asyncio
import requests
import aiohttp
from types import SimpleNamespace
durations = []
def timed(func):
"""
records approximate durations of function calls
"""
def wrapper(*args, **kwargs):
start = time.time()
print(f'{func.__name__:<30} started')
result = func(*args, **kwargs)
duration = f'{func.__name__:<30} finished in {time.time() - start:.2f} seconds'
print(duration)
durations.append(duration)
return result
return wrapper
async def fetch(url, session):
"""
asynchronous get request
"""
async with session.get(url) as response:
response_json = await response.json()
return SimpleNamespace(**response_json)
async def fetch_many(loop, urls):
"""
many asynchronous get requests, gathered
"""
async with aiohttp.ClientSession() as session:
tasks = [loop.create_task(fetch(url, session)) for url in urls]
return await asyncio.gather(*tasks)
@timed
def sync_requests_get_all(urls):
"""
performs synchronous get requests
"""
# use session to reduce network overhead
session = requests.Session()
return [SimpleNamespace(**session.get(url).json()) for url in urls]
@timed
def async_requests_get_all(urls):
"""
asynchronous wrapper around synchronous requests
"""
loop = asyncio.get_event_loop()
# use session to reduce network overhead
session = requests.Session()
async def async_get(url):
return session.get(url)
async_tasks = [loop.create_task(async_get(url)) for url in urls]
return loop.run_until_complete(asyncio.gather(*async_tasks))
@timed
def asnyc_aiohttp_get_all(urls):
"""
performs asynchronous get requests
"""
loop = asyncio.get_event_loop()
return loop.run_until_complete(fetch_many(loop, urls))
if __name__ == '__main__':
# this endpoint takes ~3 seconds to respond,
# so a purely synchronous implementation should take
# little more than 30 seconds and a purely asynchronous
# implementation should take little more than 3 seconds.
urls = ['https://postman-echo.com/delay/3']*10
sync_requests_get_all(urls)
async_requests_get_all(urls)
asnyc_aiohttp_get_all(urls)
print('----------------------')
[print(duration) for duration in durations]
Run Code Online (Sandbox Code Playgroud)
在我的机器上,这是输出:
sync_requests_get_all started
sync_requests_get_all finished in 30.92 seconds
async_requests_get_all started
async_requests_get_all finished in 30.87 seconds
asnyc_aiohttp_get_all started
asnyc_aiohttp_get_all finished in 3.22 seconds
----------------------
sync_requests_get_all finished in 30.92 seconds
async_requests_get_all finished in 30.87 seconds
asnyc_aiohttp_get_all finished in 3.22 seconds
Run Code Online (Sandbox Code Playgroud)
Uri*_*Uri 10
你可以用httpx它。
import httpx
async def get_async(url):
async with httpx.AsyncClient() as client:
return await client.get(url)
urls = ["http://google.com", "http://wikipedia.org"]
# Note that you need an async context to use `await`.
await asyncio.gather(*map(get_async, urls))
Run Code Online (Sandbox Code Playgroud)
如果您想要函数式语法,gamla库会将其包装到get_async.
然后你可以做
await gamla.map(gamla.get_async(10))(["http://google.com", "http://wikipedia.org"])
Run Code Online (Sandbox Code Playgroud)
这10是以秒为单位的超时。
(免责声明:我是它的作者)
我知道这已经关闭了一段时间,但我认为推广构建在请求库上的另一个异步解决方案可能会有用.
list_of_requests = ['http://moop.com', 'http://doop.com', ...]
from simple_requests import Requests
for response in Requests().swarm(list_of_requests):
print response.content
Run Code Online (Sandbox Code Playgroud)
文档在这里:http://pythonhosted.org/simple-requests/
如果您想使用 asyncio,requests-async则为requests- https://github.com/encode/requests-async提供 async/await 功能
小智 5
threads=list()
for requestURI in requests:
t = Thread(target=self.openURL, args=(requestURI,))
t.start()
threads.append(t)
for thread in threads:
thread.join()
...
def openURL(self, requestURI):
o = urllib2.urlopen(requestURI, timeout = 600)
o...
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
205272 次 |
| 最近记录: |