Rob*_*ter 19 python selenium webdriver phantomjs
所以我试图在我的WebDriver中的新标签上打开网站.我想这样做,因为使用PhantomJS为每个网站打开一个新的WebDriver需要大约3.5秒,我想要更快的速度......
我正在使用多进程python脚本,我想从每个页面获取一些元素,因此工作流程如下所示:
Open Browser
Loop throught my array
For element in array -> Open website in new tab -> do my business -> close it
Run Code Online (Sandbox Code Playgroud)
但我找不到任何方法来实现这一目标.
这是我正在使用的代码.网站之间需要永远,我需要快速...其他工具是允许的,但我不知道有太多工具可以删除使用JavaScript加载的网站内容(在加载时触发某些事件时创建的div)这是为什么我需要Selenium ... BeautifulSoup不能用于我的一些页面.
#!/usr/bin/env python
import multiprocessing, time, pika, json, traceback, logging, sys, os, itertools, urllib, urllib2, cStringIO, mysql.connector, shutil, hashlib, socket, urllib2, re
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from PIL import Image
from os import listdir
from os.path import isfile, join
from bs4 import BeautifulSoup
from pprint import pprint
def getPhantomData(parameters):
try:
# We create WebDriver
browser = webdriver.Firefox()
# Navigate to URL
browser.get(parameters['target_url'])
# Find all links by Selector
links = browser.find_elements_by_css_selector(parameters['selector'])
result = []
for link in links:
# Extract link attribute and append to our list
result.append(link.get_attribute(parameters['attribute']))
browser.close()
browser.quit()
return json.dumps({'data': result})
except Exception, err:
browser.close()
browser.quit()
print err
def callback(ch, method, properties, body):
parameters = json.loads(body)
message = getPhantomData(parameters)
if message['data']:
ch.basic_ack(delivery_tag=method.delivery_tag)
else:
ch.basic_reject(delivery_tag=method.delivery_tag, requeue=True)
def consume():
credentials = pika.PlainCredentials('invitado', 'invitado')
rabbit = pika.ConnectionParameters('localhost',5672,'/',credentials)
connection = pika.BlockingConnection(rabbit)
channel = connection.channel()
# Conectamos al canal
channel.queue_declare(queue='com.stuff.images', durable=True)
channel.basic_consume(callback,queue='com.stuff.images')
print ' [*] Waiting for messages. To exit press CTRL^C'
try:
channel.start_consuming()
except KeyboardInterrupt:
pass
workers = 5
pool = multiprocessing.Pool(processes=workers)
for i in xrange(0, workers):
pool.apply_async(consume)
try:
while True:
continue
except KeyboardInterrupt:
print ' [*] Exiting...'
pool.terminate()
pool.join()
Run Code Online (Sandbox Code Playgroud)
abe*_*rna 33
您可以通过组合键COMMAND+ T或COMMAND+ W(OSX)来实现选项卡的打开/关闭.在其他操作系统上,您可以使用CONTROL+ T/ CONTROL+ W.
在硒中你可以模仿这种行为.您将需要创建一个webdriver和多个选项卡作为您需要的测试.
这是代码.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://www.google.com/")
#open tab
driver.find_element_by_tag_name('body').send_keys(Keys.COMMAND + 't')
# You can use (Keys.CONTROL + 't') on other OSs
# Load a page
driver.get('http://stackoverflow.com/')
# Make the tests...
# close the tab
# (Keys.CONTROL + 'w') on other OSs.
driver.find_element_by_tag_name('body').send_keys(Keys.COMMAND + 'w')
driver.close()
Run Code Online (Sandbox Code Playgroud)
yuc*_*cer 19
这是从另一个示例改编的常见代码:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://www.google.com/")
#open tab
# ... take the code from the options below
# Load a page
driver.get('http://bings.com')
# Make the tests...
# close the tab
driver.quit()
Run Code Online (Sandbox Code Playgroud)
可能的方式是:
发送<CTRL> + <T>到一个元素
#open tab
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 't')
Run Code Online (Sandbox Code Playgroud)<CTRL> + <T>通过Action链发送
ActionChains(driver).key_down(Keys.CONTROL).send_keys('t').key_up(Keys.CONTROL).perform()
Run Code Online (Sandbox Code Playgroud)执行一个javascript代码段
driver.execute_script('''window.open("http://bings.com","_blank");''')
Run Code Online (Sandbox Code Playgroud)
为了实现这一点,您需要确保正确设置首选项browser.link.open_newwindow和browser.link.open_newwindow.restriction.最新版本中的默认值是可以的,否则您需要:
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.link.open_newwindow", 3)
fp.set_preference("browser.link.open_newwindow.restriction", 2)
driver = webdriver.Firefox(browser_profile=fp)
Run Code Online (Sandbox Code Playgroud)
问题是那些偏好预设为其他值并且至少被 冻结为硒3.4.0.当您使用配置文件使用java绑定设置它们时会出现异常,并且使用python绑定时会忽略新值.
在Java中,有一种方法可以在与geckodriver交谈时设置这些首选项而无需指定配置文件对象,但它似乎尚未在python绑定中实现:
FirefoxOptions options = new FirefoxOptions().setProfile(fp);
options.addPreference("browser.link.open_newwindow", 3);
options.addPreference("browser.link.open_newwindow.restriction", 2);
FirefoxDriver driver = new FirefoxDriver(options);
Run Code Online (Sandbox Code Playgroud)第三个选项在selenium 3.4.0中停止了python的工作.
前两个选项似乎也停止在selenium 3.4.0中工作.它们依赖于将CTRL键事件发送到元素.乍一看似乎这是CTRL键的问题,但由于Firefox的新多进程功能,它失败了.可能这个新架构强加了新的方法,或者可能是一个临时的实现问题.无论如何我们可以通过以下方式禁用
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.tabs.remote.autostart", False)
fp.set_preference("browser.tabs.remote.autostart.1", False)
fp.set_preference("browser.tabs.remote.autostart.2", False)
driver = webdriver.Firefox(browser_profile=fp)
Run Code Online (Sandbox Code Playgroud)
...然后你可以成功使用第一种方式.
Sup*_*dar 15
browser.execute_script('''window.open("http://bings.com","_blank");''')
Run Code Online (Sandbox Code Playgroud)
当浏览器是的webdriver
小智 15
from selenium import webdriver
import time
driver = webdriver.Firefox(executable_path=r'TO\Your\Path\geckodriver.exe')
driver.get('https://www.google.com/')
# Open a new window
driver.execute_script("window.open('');")
# Switch to the new window
driver.switch_to.window(driver.window_handles[1])
driver.get("http://stackoverflow.com")
time.sleep(3)
# Open a new window
driver.execute_script("window.open('');")
# Switch to the new window
driver.switch_to.window(driver.window_handles[2])
driver.get("https://www.reddit.com/")
time.sleep(3)
# close the active tab
driver.close()
time.sleep(3)
# Switch back to the first tab
driver.switch_to.window(driver.window_handles[0])
driver.get("https://bing.com")
time.sleep(3)
# Close the only tab, will also close the browser.
driver.close()
Run Code Online (Sandbox Code Playgroud)
其他解决方案不适用于chrome driver v83。
相反,它的工作原理如下,假设只有 1 个打开的选项卡:
driver.execute_script("window.open('');")
driver.switch_to.window(driver.window_handles[1])
driver.get("https://www.example.com")
Run Code Online (Sandbox Code Playgroud)
如果已经有 1 个以上的打开选项卡,您应该首先获取最后一个新创建的选项卡的索引并在调用 url 之前切换到该选项卡(Credit to tylerl):
driver.execute_script("window.open('');")
driver.switch_to.window(len(driver.window_handles)-1)
driver.get("https://www.example.com")
Run Code Online (Sandbox Code Playgroud)
小智 8
试试这个它会起作用:
# Open a new Tab
driver.execute_script("window.open('');")
# Switch to the new window and open URL B
driver.switch_to.window(driver.window_handles[1])
driver.get(tab_url)
Run Code Online (Sandbox Code Playgroud)
小智 8
Selenium 4.0.0版本支持以下操作:
要打开新选项卡,请尝试:
driver.switch_to.new_window()
切换到特定选项卡(注意tabID从 0 开始):
driver.switch_to.window(driver.window_handles[tabID])
小智 7
经过这么长时间的努力,下面的方法对我有用:
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 't')
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.TAB)
windows = driver.window_handles
time.sleep(3)
driver.switch_to.window(windows[1])
Run Code Online (Sandbox Code Playgroud)
在讨论中,Simon明确提到:
虽然用于存储句柄列表的数据类型可以通过插入来排序,但是WebDriver实现在窗口句柄上迭代以插入它们的顺序并不需要稳定。顺序是任意的。
现在,使用Selenium v3.x通过Python在“ 新标签”中打开网站要容易得多。我们必须诱导WebDriverWait的,然后我们每次打开一个新标签/窗口,并最终通过迭代的窗口句柄和时间收集窗口句柄是必需的。这里是一个解决方案,您可以打开在最初的TAB和在相邻的TAB:number_of_windows_to_be(2)switchTo().window(newly_opened)http://www.google.co.inhttps://www.yahoo.com
代码块:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("http://www.google.co.in")
print("Initial Page Title is : %s" %driver.title)
windows_before = driver.current_window_handle
print("First Window Handle is : %s" %windows_before)
driver.execute_script("window.open('https://www.yahoo.com')")
WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2))
windows_after = driver.window_handles
new_window = [x for x in windows_after if x != windows_before][0]
driver.switch_to_window(new_window)
print("Page Title after Tab Switching is : %s" %driver.title)
print("Second Window Handle is : %s" %new_window)
Run Code Online (Sandbox Code Playgroud)控制台输出:
Initial Page Title is : Google
First Window Handle is : CDwindow-B2B3DE3A222B3DA5237840FA574AF780
Page Title after Tab Switching is : Yahoo
Second Window Handle is : CDwindow-D7DA7666A0008ED91991C623105A2EC4
Run Code Online (Sandbox Code Playgroud)浏览器快照:

您可以在使用Selenium的WindowHandles中使用跟踪和遍历选项卡和窗口的最佳方式中找到基于Java的讨论
from selenium import webdriver
import time
driver = webdriver.Firefox()
driver.get('https://www.google.com')
driver.execute_script("window.open('');")
time.sleep(5)
driver.switch_to.window(driver.window_handles[1])
driver.get("https://facebook.com")
time.sleep(5)
driver.close()
time.sleep(5)
driver.switch_to.window(driver.window_handles[0])
driver.get("https://www.yahoo.com")
time.sleep(5)
#driver.close()
Run Code Online (Sandbox Code Playgroud)
https://www.edureka.co/community/52772/close-active-current-without-looking-browser-selenium-python
| 归档时间: |
|
| 查看次数: |
86315 次 |
| 最近记录: |