我正在尝试使用 RSelenium 抓取网站。但是,当我想连接到 Selenium 服务器时遇到了问题。
想象一下,我使用 rsDriver() 命令启动 selenium 服务器和浏览器:
rsDriver(browser = c('firefox'))
Run Code Online (Sandbox Code Playgroud)
这是生成的输出:
[1] "Connecting to remote server"
Fehler in checkError(res) :
Couldnt connect to host on http://localhost:4567/wd/hub.
Please ensure a Selenium server is running.
Zusätzlich: Warnmeldung:
In rsDriver(browser = c("firefox")) : Could not determine server status.
Run Code Online (Sandbox Code Playgroud)
或者我尝试了这个命令(在 stackoverflow 的另一个线程中找到它):
remDr <- remoteDriver(remoteServerAddr = "localhost"
, port = 4444L
, browserName = "htmlunit"
)
remDr$open()
Run Code Online (Sandbox Code Playgroud)
但它失败了:
[1] "Connecting to remote server"
Fehler in checkError(res) :
Couldnt connect to host on http://localhost:4444/wd/hub.
Please ensure a Selenium server is running.
Run Code Online (Sandbox Code Playgroud)
这是我的会话信息:
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.2
locale:
[1] de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] seleniumPipes_0.3.7 whisker_0.3-2 magrittr_1.5 xml2_1.1.1 jsonlite_1.2 httr_1.2.1
[7] RSelenium_1.7.1 wdman_0.2.2
loaded via a namespace (and not attached):
[1] Rcpp_0.12.9 XML_3.98-1.5 binman_0.1.0 assertthat_0.1 bitops_1.0-6 rappdirs_0.3.1 R6_2.2.0
[8] semver_0.2.0 curl_2.3 subprocess_0.8.0 tools_3.3.2 yaml_2.1.14 caTools_1.17.1 openssl_0.9.6
Run Code Online (Sandbox Code Playgroud)
我在 macOS Sierra 版本 10.12.2 上使用 Firefox 版本 51.0.1(64 位)。
任何帮助是极大的赞赏!
检查 Selenium 服务器是否正在运行。您可以尝试自动运行一个:
library(RSelenium)
library(wdman)
selServ <- wdman::selenium(verbose = FALSE)
Run Code Online (Sandbox Code Playgroud)
然后您可以检查日志以查看是否存在任何问题:
selServ$log()
Run Code Online (Sandbox Code Playgroud)
或者,您可以尝试手动运行 Selenium 服务器:
library(RSelenium)
library(wdman)
selServ <- wdman::selenium(retcommand = TRUE, verbose = FALSE)
Run Code Online (Sandbox Code Playgroud)
cat(selServ)然后在终端中手动运行输出:
> cat(selServ)
/usr/bin/java -Dwebdriver.chrome.driver='/Users/admin/Library/Application Support/binman_chromedriver/mac64/2.27/chromedriver' -Dwebdriver.gecko.driver='/Users/admin/Library/Application Support/binman_geckodriver/macos/0.14.0/geckodriver' -Dphantomjs.binary.path='/Users/admin/Library/Application Support/binman_phantomjs/macosx/2.1.1/phantomjs-2.1.1-macosx/bin/phantomjs' -jar '/Users/admin/Library/Application Support/binman_seleniumserver/generic/3.0.1/selenium-server-standalone-3.0.1.jar' -port 4567
Run Code Online (Sandbox Code Playgroud)
谢谢@jdharrison!我有一个类似的问题并且很困惑,因为昨天 RSelenium 仍然工作正常,但今天它不再启动浏览器了。跑步:
library(wdman)
selServ <- wdman::selenium(verbose = FALSE)
selServ$log()
Run Code Online (Sandbox Code Playgroud)
向我展示了问题是由一夜之间下载的损坏的 jarfile 引起的:
"Error: Invalid or corrupt jarfile C:\\Users\\user.name\\AppData\\Local\\binman\\binman_seleniumserver\\generic\\3.8.0/selenium-server-standalone-3.8.0.jar"
Run Code Online (Sandbox Code Playgroud)
RSelenium 中的 rsDriver() 函数自动使用最新的 selenium-server-standalone jarfile。当我用以前的 jarfile 运行 rsDriver 时,一切又正常了:
rD <- rsDriver(verbose = FALSE, version = "3.7.1")
Run Code Online (Sandbox Code Playgroud)