Python与Selenium并行执行

Question

Python与Selenium并行执行

Ke.*_*Ke. 5 python parallel-processing selenium concurrent.futures

我对使用硒的python中的并行执行感到困惑。似乎有几种方法可以解决，但有些似乎已过时。

我想知道使用硒进行并行执行的最新方法是什么？

有一个名为python-wd-parallel的python模块，似乎有一些功能可以做到这一点，但这是从2013年开始的，现在仍然有用吗？

例如https://saucelabs.com/blog/parallel-testing-with-python-and-selenium-on-sauce-online-workshop-recap

我们还拥有并发。未来，这似乎更新了很多，但实现起来却不那么容易-有人在硒中有一个可以并行执行的有效示例吗？

也仅使用线程和执行程序来完成工作，但是我觉得这样做会比较慢，因为它没有使用所有内核，并且仍以串行形式运行。

Answer 1

blu*_*ers 9

使用joblib 的 Parallel模块来做到这一点，它是一个很好的并行执行库。

假设我们有一个命名的 url 列表，urls我们想并行截取每个URL的屏幕截图

首先让我们导入必要的库

from selenium import webdriver
from joblib import Parallel, delayed

Run Code Online (Sandbox Code Playgroud)

现在让我们定义一个将屏幕截图作为 base64 的函数

def take_screenshot(url):
    phantom = webdriver.PhantomJS('/path/to/phantomjs')
    phantom.get(url)
    screenshot = phantom.get_screenshot_as_base64()
    phantom.close()

    return screenshot

Run Code Online (Sandbox Code Playgroud)

现在要并行执行，你要做的是

screenshots = Parallel(n_jobs=-1)(delayed(take_screenshot)(url) for url in urls)

Run Code Online (Sandbox Code Playgroud)

当这一行完成执行时，您将拥有screenshots来自所有运行进程的所有数据。

关于平行的说明

Parallel(n_jobs=-1) 意味着使用您可以使用的所有资源
delayed(function)(input)isjoblib为您尝试并行运行的函数创建输入的方式

更多信息可以在joblib文档中找到

非常感谢。我的理由是每个进程都有一个驱动程序实例（多个进程没有一个驱动程序实例），因为在“如何加速硒”列表中，行项目“重用驱动程序实例”几乎位于最上面 (2认同)
为了不重新创建实例，我会将 `urls` 列表切碎为大小均匀的子列表，然后将它们发送到进程，这样进程的产生（以及 webdriver 实例的创建）每个进程只会发生一次 (2认同)

Answer 2

eus*_*iro 7

Python Parallel Wd从它的 github 上看来已经死了（最后一次提交是在 9 年前）。它还为 selenium 实现了一个过时的协议。最后的代码是专有的Saucelabs。

\n

一般来说，最好使用SeleniumBase一个基于 selenium 和 pytest 的 Python 测试框架。它非常完整，支持性能提升、并行线程等等方面的一切。如果这不是您的情况...请继续阅读。

\n

Selenium 性能提升（并发.futures）

\n

简答

\n

两者threads都会processes给你的selenium代码带来相当大的加速。

\n

\n
下面给出了简短的例子。Selenium 的工作是通过selenium_title返回页面标题的函数完成的。这不处理每个线程/进程执行期间发生的异常。为此，请查看长答案-处理异常。
\n
\n
线程池concurrent.futures.ThreadPoolExecutor。
\n
\n
from selenium import webdriver \nfrom concurrent import futures\n\ndef selenium_title(url): \n wdriver = webdriver.Chrome() # chrome webdriver\n wdriver.get(url) \n title = wdriver.title \n wdriver.quit()\n return title\n\nlinks = ["https://www.amazon.com", "https://www.google.com"]\n\nwith futures.ThreadPoolExecutor() as executor: # default/optimized number of threads\n titles = list(executor.map(selenium_title, links))\n
Run Code Online (Sandbox Code Playgroud)\n
\n
流程工人池 concurrent.futures.ProcessPoolExecutor。只需要在上面的代码中替换ThreadPoolExecuter为即可。ProcessPoolExecutor它们都是从Executor基类派生的。您还必须保护main，如下所示。
\n
\n
if __name__ == \'__main__\':\n with futures.ProcessPoolExecutor() as executor: # default/optimized number of processes\n titles = list(executor.map(selenium_title, links))\n
Run Code Online (Sandbox Code Playgroud)\n
长答案
\n
为什么ThreadsPython GIL 可以工作？
\n
即使是强大的 Python，由于 Python GIL 的原因，对线程也有限制，即使线程会进行上下文切换。性能提升将归因于 Selenium 的实现细节。Selenium 通过发送POST, GET( HTTP requests) 等命令来工作。这些被发送到浏览器驱动程序服务器。因此，您可能已经知道 I/O 绑定任务 ( HTTP requests) 会释放 GIL，从而提高性能。
\n
处理异常
\n
我们可以对上面的示例进行一些小的修改来处理Exceptions生成的线程。executor.map我们不使用而是使用executor.submit. 这将返回包裹在实例上的标题Future。
\n
要访问返回的标题，我们可以使用future_titles[index].resultwhere index size len(links)，或者简单地使用for如下所示的内容。
\n
with futures.ThreadPoolExecutor() as executor:\n future_titles = [ executor.submit(selenium_title, link) for link in links ]\n for future_title, link in zip(future_titles, links): \n try: \n title = future_title.result() # can use `timeout` to wait max seconds for each thread \n except Exception as exc: # this thread migh have had an exception\n print(\'url {:0} generated an exception: {:1}\'.format(link, exc))\n
Run Code Online (Sandbox Code Playgroud)\n
请注意，除了迭代之外，future_titles我们还进行迭代links，以便Exception在某个线程中我们知道url(link)是哪个线程造成的。
\n
该类futures.Future很酷，因为它们使您可以控制从每个线程收到的结果。比如它是否正确完成或存在异常等，更多信息请参见此处。
\n
另外值得一提的是，futures.as_completed如果您不关心线程返回项目的顺序，那就更好了。但由于控制异常的语法有点难看，我在这里省略了它。
\n
性能增益和线程
\n
首先，为什么我一直使用线程来加速我的硒代码：
\n
\n
在 I/O 绑定任务中，我使用 selenium 的经验表明，使用进程池 ( ) 或线程池 ( ) 之间的差异很小或没有差异。这里还得出了关于 I/O 绑定任务上的 Python 线程与进程的类似结论。ProcessThreads
\n
我们还知道进程使用自己的内存空间。这意味着更多的内存消耗。此外，进程的生成速度比线程要慢一些。
\n
\n

归档时间：	8 年，10 月前
查看次数：	6494 次
最近记录：	7 年，6 月前

Python与Selenium并行执行

Selenium 性能提升（并发.futures）

简答

长答案

为什么ThreadsPython GIL 可以工作？

处理异常

性能增益和线程

为什么`Threads`Python GIL 可以工作？