xba*_*laj 6 python selenium multithreading urllib selenium-webdriver
我正在使用 selenium 顺序处理许多页面,但为了提高性能,我决定并行化处理 - 将页面拆分到更多线程之间(这是可以完成的,因为页面彼此独立)。
这是简化的代码:
def process_page(driver, page, lock):
driver.get("page.url()")
driver.find_element_by_css_selector("some selector")
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "some selector")))
.
.
.
with lock:
for i in range(result_tuple.__len__()):
logger.info(result_tuple[i])
return result_tuple
def process_all_pages():
def pages_processing(id, lock):
result = []
with MyWebDriver(webdriver_options) as driver:
for i in range(50):
result.append(process_page(driver, pages[id * 50 + i], lock))
return result
lock = threading.Lock()
with ThreadPoolExecutor(4) as executor:
futures = []
for i in range(4):
futures.append(executor.submit(pages_processing, i, lock))
result = []
for i in range(futures.__len__()):
result.append(futures[i].result())
return result
Run Code Online (Sandbox Code Playgroud)
MyWebDriver只是 Chrome 驱动程序的一个简单的上下文管理器,当进入上下文时,它会生成一个新的 Chrome 驱动程序实例,当它退出上下文时,它会退出给定的 Chrome 实例。
这段代码为每个线程分别生成 4 个 Chrome 驱动程序,并使一些 selenium 在 Chrome 驱动程序中工作,每个线程也分别工作。
在最初的几秒钟内,它的工作方式就像一个魅力,但一段时间后,记录器中开始出现警告,并且 Selenium 似乎停止与 Chrome 驱动程序通信。
如果需要,我还可以提供调试日志,但不确定是否有相关内容。
记录器中的警告:
...
# With these first warnings selenium stops to communicate with some Chrome drivers - just nothing happens in some of them.
WARNING - urllib3.connectionpool - Connection pool is full, discarding connection: 127.0.0.1
WARNING - urllib3.connectionpool - Connection pool is full, discarding connection: 127.0.0.1
...
# These warnings come a bit later
WARNING - urllib3.connectionpool - Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000018343AB24A8>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it')': /session/9c9fc148f278aaa360a26d95eac0966e/url
WARNING - urllib3.connectionpool - Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000018348854E10>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it')': /session/9c9fc148f278aaa360a26d95eac0966e/url
WARNING - urllib3.connectionpool - Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000018348869710>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it')': /session/9c9fc148f278aaa360a26d95eac0966e/url
...
Run Code Online (Sandbox Code Playgroud)
我尝试过这些补丁来设置更高的 maxsize (HTTPConnectionPool、HTTPSConnectionPool) - /sf/answers/1557755951/ - 顺便说一句,这并没有解决问题。补丁已被执行。
接下来,我尝试在PoolManager类中设置更高的 num_pools - 我仅在源中更改了这一点,并且还更改了HTTPConnectionPool和HTTPSConnectionPool中的 maxsize 。这实际上解决了一个问题 - 日志中没有警告,但与驱动程序的硒通信仍然冻结。
归档时间: |
|
查看次数: |
3066 次 |
最近记录: |