标签: urllib3

如何使用python请求执行限时响应下载？

当用python下载大文件时,我想为时间限制设置一个时间限制,不仅适用于连接过程,还适用于下载.

我正在尝试使用以下python代码:

import requests

r = requests.get('http://ipv4.download.thinkbroadband.com/1GB.zip', timeout = 0.5, prefetch = False)

print r.headers['content-length']

print len(r.raw.read())

Run Code Online (Sandbox Code Playgroud)

这不起作用(下载没有时间限制),正如文档中正确指出的那样:https://requests.readthedocs.org/en/latest/user/quickstart/#timeouts

如果有可能,这将是伟大的:

r.raw.read(timeout = 10)

Run Code Online (Sandbox Code Playgroud)

问题是,如何为下载设置时间限制？

python urllib3 python-requests

Hri*_*tov

lucky-day

11
推荐指数

1
解决办法

6656
查看次数

脚本突然停止爬行而没有错误或异常

我不确定为什么,但我的脚本一旦到达第9页就会停止爬行.没有错误,例外或警告,所以我有点不知所措.

有人可以帮帮我吗？

PS 这是完整的脚本,以防任何人想要自己测试它!

def initiate_crawl():
    def refresh_page(url):
        ff = create_webdriver_instance()
        ff.get(url)
        ff.find_element(By.XPATH, '//*[@id="FilterItemView_sortOrder_dropdown"]/div/span[2]/span/span/span/span').click()
        ff.find_element(By.XPATH, '//a[contains(text(), "Discount - High to Low")]').click()
        items = WebDriverWait(ff, 15).until(
            EC.visibility_of_all_elements_located((By.XPATH, '//div[contains(@id, "100_dealView_")]'))
        )
        print(len(items))
        for count, item in enumerate(items):
            slashed_price = item.find_elements(By.XPATH, './/span[contains(@class, "a-text-strike")]')
            active_deals = item.find_elements(By.XPATH, './/*[contains(text(), "Add to Cart")]')
            if len(slashed_price) > 0 and len(active_deals) > 0:
                product_title = item.find_element(By.ID, 'dealTitle').text
                if product_title not in already_scraped_product_titles:
                    already_scraped_product_titles.append(product_title)
                    url = ff.current_url
                    ff.quit()
                    refresh_page(url)
                    break
            if count+1 is len(items):
                try:
                    next_button = …

Run Code Online (Sandbox Code Playgroud)

python selenium urllib3 python-requests geckodriver

Pri*_*Nom

2018 10-12

11
推荐指数

1
解决办法

997
查看次数

Python urllib3以及如何处理cookie支持？

所以我正在研究urllib3,因为它有连接池并且是线程安全的(因此性能更好,特别是对于爬行),但文档是......最小的说法.urllib2有build_opener,所以类似于:

#!/usr/bin/python
import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")

Run Code Online (Sandbox Code Playgroud)

但是urllib3没有build_opener方法,所以到目前为止我唯一想到的方法是手动将它放在标题中:

#!/usr/bin/python
import urllib3
http_pool = urllib3.connection_from_url("http://example.com")
myheaders = {'Cookie':'some cookie data'}
r = http_pool.get_url("http://example.org/", headers=myheaders)

Run Code Online (Sandbox Code Playgroud)

但我希望有更好的方法,你们中的一个人可以告诉我它是什么.也可以有人用"urllib3"标记这个.

python urllib3

big*_*bob

2010 03-11

9
推荐指数

1
解决办法

1万
查看次数

我应该选择哪个urllib？

我们知道,python有两个内置的url lib:

urllib
urllib2

和第三方库:

urllib3

如果我的要求只是通过GET方法请求API,则假设它返回一个JSON字符串.
我应该使用哪个lib？他们有一些重复的功能吗？
如果urllib可以实现我的要求,但是如果我的要求越来越复杂,urllib不能适合我的功能,我当时应该导入另一个lib,但我真的只想导入一个lib,因为我认为导入所有的他们可以让我困惑,我认为他们之间的方法是完全不同的.

所以现在我很困惑我应该使用哪个库,我更喜欢urllib3,我认为它可以满足我的要求所有时间,你怎么看？

python urllib urllib2 urllib3

Mat*_*nes

2018 03-06

9
推荐指数

1
解决办法

2万
查看次数

点,代理身份验证和"不支持代理方案"

试图在新的python安装上安装pip.我遇到了代理错误.看起来像是一个错误get-pip或urllib3??

问题是我必须经历如此处所述的设置CNTLM的痛苦还是有快捷方式？

get-pip.py文档说使用--proxy="[user:passwd@]proxy.server:port"选项来指定代理和相关的身份验证.但似乎pip传递了整个事物,因为它将urllib3"myusr"解释为url方案,因为':'我想(？).

C:\ProgFiles\Python27>get-pip.py --proxy myusr:mypswd@111.222.333.444:80
Downloading/unpacking pip
Cleaning up...
Exception:
Traceback (most recent call last):
  File "c:\users\sg0219~1\appdata\local\temp\tmpxwg_en\pip.zip\pip\basecommand.py", line 122, in main
    status = self.run(options, args)
  File "c:\users\sg0219~1\appdata\local\temp\tmpxwg_en\pip.zip\pip\commands\install.py", line 278, in run
    requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
  File "c:\users\sg0219~1\appdata\local\temp\tmpxwg_en\pip.zip\pip\req.py", line 1177, in prepare_files
    url = finder.find_requirement(req_to_install, upgrade=self.upgrade)
  File "c:\users\sg0219~1\appdata\local\temp\tmpxwg_en\pip.zip\pip\index.py", line 194, in find_requirement
    page = self._get_page(main_index_url, req)
  File "c:\users\sg0219~1\appdata\local\temp\tmpxwg_en\pip.zip\pip\index.py", line 568, in _get_page
    session=self.session,
  File "c:\users\sg0219~1\appdata\local\temp\tmpxwg_en\pip.zip\pip\index.py", line 670, in get_page
    resp …

Run Code Online (Sandbox Code Playgroud)

python pip urllib pypi urllib3

Kas*_*yap

2017 05-23

9
推荐指数

4
解决办法

3万
查看次数

python-requests多久执行一次dns查询

我们正在使用Locust对弹性负载平衡背后的 rest api 服务进行负载测试。我看到了这篇关于负载平衡和自动缩放的文章，这是我们正在测试的内容。

Locust 使用的是使用urllib3 的python-requests，所以我的问题是 python-requests 是否为每个连接进行 dns 查询，如果没有，是否可以配置？

python dns urllib3 python-requests locust

djo*_*son

lucky-day

9
推荐指数

1
解决办法

5375
查看次数

使用HTTPAdapter的Python请求暂停了几个小时

我有一个特殊的URL,我的代码暂停了几个小时(超过3个小时).我似乎无法理解为什么会这样做.

该URL是http://www.etudes.ccip.fr/maintenance_site.php.

直接的requests.get()即时工作,但每当我有一个HTTPAdapter时,代码似乎几乎无限期地睡眠

import requests from requests.adapters import HTTPAdapter url = 'http://www.etudes.ccip.fr/maintenance_site.php' session = requests.Session() session.mount('http://', HTTPAdapter(max_retries=2)) session.get(url, timeout=2)
Run Code Online (Sandbox Code Playgroud)

python urllib3 python-requests

fas*_*cen

2017 11-23

9
推荐指数

2
解决办法

1702
查看次数

Python - 如果服务器在 PUT 完成之前回答，则 HTTP 模块无法解析响应

我正在使用requests（它使用urllib3引擎盖下的 Python http 模块）库从 Python 脚本上传文件。我的后端首先检查请求的标头，如果它不符合所需的先决条件，它会立即停止请求并以有效的 400 响应进行响应。

这种行为在 Postman 或 Curl 中运行良好；即客户端能够解析 400 响应，即使它没有完成上传并且服务器过早地响应。但是，在 Python 中使用requests/执行此操作时urllib3，库无法处理后端响应：

Traceback (most recent call last): File "C:\Users\Neumann\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\urllib3\connectionpool.py", line 670, in urlopen httplib_response = self._make_request( File "C:\Users\Neumann\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\urllib3\connectionpool.py", line 392, in _make_request conn.request(method, url, **httplib_request_kw) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\http\client.py", line 1255, in request self._send_request(method, url, body, headers, encode_chunked) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\http\client.py", line 1301, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\http\client.py", line 1250, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "C:\Program …
Run Code Online (Sandbox Code Playgroud)

python http urllib3 python-3.x python-requests

Neu*_*ann

2021 01-04

9
推荐指数

1
解决办法

451
查看次数

urllib3 支持 HTTP/2 请求吗？会吗？

我对各种 python HTTP 库了解如下：

Requests不支持 HTTP/2 请求。

Hyper确实支持 HTTP/2 请求，但已于2021 年初存档，对于新项目来说不是一个好的选择。

HTTPX确实支持 HTTP/2，但这种支持是可选的，需要安装额外的依赖项，并且附带一些有关粗糙边缘的警告。

AIOHTTP尚不支持 HTTP2（截至 2022 年 4 月中旬）。

该项目的重点也不仅仅在于成为客户端——该软件包还包括服务器。

我知道的另一个主要 HTTP 请求库是urllib3。这是OpenAPI Generator在生成 Python 客户端库时默认使用的内容。

我的问题是：

urrlib3 可以配置为发出 HTTP/2 请求吗？

我在文档中找不到任何有关http2支持的信息，并且通过我对生成的OpenAPI客户端的测试，所有请求都是HTTP/1.1。如果目前答案是否定的，维护者是否正在计划 HTTP/2 支持？我在项目的未解决问题中找不到任何证据。

python urllib3 http2 openapi-generator

kas*_*hev

2022 04-14

9
推荐指数

1
解决办法

3520
查看次数

MaxRetryError: HTTPConnectionPool: Max retries exceeded (Caused by ProtocolError('Connection aborted.', error(111, 'Connection denied')))

我有一个问题：我想测试“选择”和“输入”。我可以像下面的代码那样写：原始代码：

12 class Sinaselecttest(unittest.TestCase): 13 14 def setUp(self): 15 binary = FirefoxBinary('/usr/local/firefox/firefox') 16 self.driver = webdriver.Firefox(firefox_binary=binary) 17 18 def test_select_in_sina(self): 19 driver = self.driver 20 driver.get("https://www.sina.com.cn/") 21 try: 22 WebDriverWait(driver,30).until( 23 ec.visibility_of_element_located((By.XPATH,"/html/body/div[9]/div/div[1]/form/div[3]/input")) 24 ) 25 finally: 26 driver.quit() # #??select?? 27 select=Select(driver.find_element_by_xpath("//*[@id='slt_01']")).select_by_value("??") 28 element=driver.find_element_by_xpath("/html/body/div[9]/div/div[1]/form/div[3]/input") 29 element.send_keys("??") 30 driver.find_element_by_xpath("/html/body/div[9]/div/div[1]/form/input").click() 31 driver.implicitly_wait(5) 32 def tearDown(self): 33 self.driver.close()
Run Code Online (Sandbox Code Playgroud)
我想测试Selenium的“选择”功能。所以我选择sina网站选择一个选项并在textarea中输入文本。然后搜索它。但是当我运行这个测试时，它有错误：

Traceback (most recent call last): File "test_sina_select.py", line 32, in tearDown self.driver.close() File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 688, in close self.execute(Command.CLOSE) File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 319, in …
Run Code Online (Sandbox Code Playgroud)

python selenium urllib3 python-requests selenium-webdriver

郑佳妮*_*郑佳妮

2020 11-09

8
推荐指数

1
解决办法

3万
查看次数

标签统计

python ×10

urllib3 ×10

python-requests ×6

selenium ×2

urllib ×2

dns ×1

geckodriver ×1

http ×1

http2 ×1

locust ×1

openapi-generator ×1

pip ×1

pypi ×1

python-3.x ×1

selenium-webdriver ×1

urllib2 ×1

标签 统计

标签统计