我遇到的主要问题是从表中提取数据,但任何其他一般提示也将受到欢迎。我正在处理的表大约有 25 列和不同数量的行(从 5 到 50 之间)。
目前我正在抓取表格并将其转换为数组:
require "watir-webdriver"
b = Watir::Browser.new :chrome
b.goto "http://someurl"
# The following operation takes way too long
table = b.table(:index, 1).to_a
# The rest is fast enough
table.each do |row|
# Code for pulling data from about 15 of the columns goes here
# ...
end
b.close
Run Code Online (Sandbox Code Playgroud)
table = b.table(:index, 5).to_a
当表有20行时,该操作需要一分多钟。看起来将 20 X 25 表的单元格放入数组应该非常快。我需要对 80 多个表执行此操作,因此最终需要 1-2 小时才能运行。为什么需要这么长时间以及如何提高速度?
我尝试过迭代表行,而无需先转换为数组,但性能没有任何改进:
b.table(:index, 1).rows.each do |row|
# ...
Run Code Online (Sandbox Code Playgroud)
使用 Windows 7 和 Ubuntu 的结果相同。我也尝试过使用 …
到目前为止我有这个:
require 'watir-webdriver'
require 'date'
require 'nokogiri'
browser = Watir::Browser.start 'https://example/ViewReport.aspx'
browser.link(:text, 'Combined Employee Performance Report').click
today = Date.today
yesterday = today.prev_day.strftime('%m' '%d' '%Y')
t = browser.text_field :id => 'UC255_txtStart'
t.set yesterday
t = browser.text_field :id => 'UC255_txtEnd'
t.set yesterday
btn = browser.button :value, 'Run Report'
btn.exists?
btn.click
page = Nokogiri::HTML.parse('browser')
links = page.css("a")
puts links.length
Run Code Online (Sandbox Code Playgroud)
当我尝试解析browser
Watir用于站点URI的变量时,它给了我一个空白的HTML页面.
我用chromedriver使用watir webdriver gem.我知道(https://code.google.com/p/chromedriver/issues/detail?id=9#c25)在chromedriver的新版本2.1中有一个特殊的页面加载超时.如何从ruby代码中设置它?
我已经使用 watir 创建了用于记住网站中的登录凭据的自动化测试。
在测试场景中关闭浏览器重新打开浏览器并检查是否会打开主页。测试重定向到登录,所以我想问:
watir 中的关闭浏览器方法是否会清除测试期间添加的缓存或 cookie?
我正在使用 watir-webdriver 来自动化和测试应用程序。在此过程中,我需要单击下拉菜单并选择一个值,但是 watir 似乎无法选择该项目。请帮忙?
我的代码:
browser.text_field(:id => "user_username").set "#{username}"
browser.select_list(:id => 'user_date_of_birth_month').clear
puts browser.select_list(:id => 'user_date_of_birth_month').options
browser.select_list(:id => 'user_date_of_birth_month').select "9"
Run Code Online (Sandbox Code Playgroud)
HTML:
<label class="sc-font-light sc-text-light next-light-label" for="user_date_of_birth_month
2. When were you born?<span class="inline-help"><span class="content hidden">For information on why we ask for your date of birth, see <a href="http://help.soundcloud.com/customer/portal/articles/1481474-why-do-you-need-my-date-of-birth-" target="_blank">this help center article</a>.</span></span>
Run Code Online (Sandbox Code Playgroud)
<div class="width_1_2"><select id="user_date_of_birth_month"name="user[date_of_birth][month]">
<option value="">Month</option>
<option value="1">January</option>
<option value="2">February</option>
<option value="3">March</option>
<option value="4">April</option>
<option value="5">May</option>
<option value="6">June</option>
<option value="7">July</option>
<option value="8">August</option>
<option value="9">September</option>
<option value="10">October</option>
<option value="11">November</option>
<option …
Run Code Online (Sandbox Code Playgroud) 该脚本在本地运行,但不在服务器上运行。
b = Watir::Browser.new :chrome, headless: true
Run Code Online (Sandbox Code Playgroud)
错误:
response.rb:69:in `assert_ok': unknown error: Chrome failed to start: exited abnormally (Selenium::WebDriver::Error::UnknownError)
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
(Driver info: chromedriver=2.45.615279 (12b89733300bd268cff3b78fc76cb8f3a7cc44e5),platform=Linux 4.9.75-29.el7.x86_64 x86_64)
from /root/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/selenium-webdriver-3.141.0/lib/selenium/webdriver/remote/response.rb:32:in `initialize'
from /root/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/selenium-webdriver-3.141.0/lib/selenium/webdriver/remote/http/common.rb:84:in `new'
from /root/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/selenium-webdriver-3.141.0/lib/selenium/webdriver/remote/http/common.rb:84:in `create_response'
from /root/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/selenium-webdriver-3.141.0/lib/selenium/webdriver/remote/http/default.rb:104:in `request'
from /root/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/selenium-webdriver-3.141.0/lib/selenium/webdriver/remote/http/common.rb:62:in `call'
from /root/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/selenium-webdriver-3.141.0/lib/selenium/webdriver/remote/bridge.rb:166:in `execute'
from /root/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/selenium-webdriver-3.141.0/lib/selenium/webdriver/remote/bridge.rb:99:in `create_session'
from /root/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/selenium-webdriver-3.141.0/lib/selenium/webdriver/remote/bridge.rb:53:in `handshake'
from /root/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/selenium-webdriver-3.141.0/lib/selenium/webdriver/chrome/driver.rb:49:in `initialize'
from /root/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/selenium-webdriver-3.141.0/lib/selenium/webdriver/common/driver.rb:44:in `new' …
Run Code Online (Sandbox Code Playgroud) 我尝试安装gem Watir-WebDriver,但它无法正常工作.它说:
C:\Ruby187>gem install watir web-driver
Successfully installed watir-2.0.4
ERROR: Could not find a valid gem 'web-driver' (>= 0) in any repository
ERROR: Possible alternatives: megadriver, view_driver, testdrive, app_driver, web-facter
1 gem installed
Installing ri documentation for watir-2.0.4...
Installing RDoc documentation for watir-2.0.4...
Run Code Online (Sandbox Code Playgroud) 当测试在watir中失败时,我无法弄清楚如何捕获屏幕截图.请帮忙/举例?
这是我的代码的例子
testName = "Entered 000000 - Invalid Unit Number"
browser.text_field(:name => 'unitNumber').set '000000'
browser.button(:name => "OpRetrieve").click
message=browser.text_field(:id => 'messages').text
if message == "Invalid Unit Number"
f1.puts "PASSED #" + testId.to_s + ": " + testName
else
f1.puts "FAILED #" + testId.to_s + ": " + testName + ". Message: " + message
"Capturd screenshot"
end
testId=testId+1
Run Code Online (Sandbox Code Playgroud) 所以,我有一个包含多行和多列的表.
<table>
<tr>
<th>Employee Name</th>
<th>Reg Hours</th>
<th>OT Hours</th>
</tr>
<tr>
<td>Employee 1</td>
<td>10</td>
<td>20</td>
</tr>
<tr>
<td>Employee 2</td>
<td>5</td>
<td>10</td>
</tr>
</table>
Run Code Online (Sandbox Code Playgroud)
还有另一张表:
<table>
<tr>
<th>Employee Name</th>
<th>Revenue</th>
</tr>
<td>Employee 2</td>
<td>$10</td>
</tr>
<tr>
<td>Employee 1</td>
<td>$50</td>
</tr>
</table>
Run Code Online (Sandbox Code Playgroud)
请注意,员工订单可能在表之间是随机的.
我如何使用nokogiri创建一个以每个员工为对象的json文件,以及他们的总小时数和收入?
目前,我只能使用一些xpath获取单个表格单元格.例如:
puts page.xpath(".//*[@id='UC255_tblSummary']/tbody/tr[2]/td[1]/text()").inner_text
Run Code Online (Sandbox Code Playgroud)
编辑:
使用页面对象gem和来自@Dave_McNulla的链接,我尝试了这段代码只是为了看看我得到了什么:
class MyPage
include PageObject
table(:report, :id => 'UC255_tblSummary')
def get_some_information
report_element[1][2].text
end
end
puts get_some_information
Run Code Online (Sandbox Code Playgroud)
然而,没有任何东西被归还.
数据:https://gist.github.com/anonymous/d8cc0524160d7d03d37b
小时表有一个副本.第一个很好.需要的另一个表是附件收入表.(我还需要激活表,但我会尝试将合并小时和附件收入表的代码合并.
watir ×10
ruby ×6
automation ×2
nokogiri ×2
webdriver ×2
cookies ×1
performance ×1
web-scraping ×1