在Linux上,RSelenium/ Selenium似乎表现不正常.我手动启动服务器,似乎启动良好.有时我可以从R会话连接到它,有时我会收到错误.我还不能确定原因:相同的脚本似乎有时会工作,而不是其他时间.有任何想法吗?
以下是启动服务器的输出:
12:41:25.811 INFO - Launching a standalone server
12:41:26.102 INFO - Java: Sun Microsystems Inc. 11.0-b16
12:41:26.102 INFO - OS: Linux 2.6.32-431.17.1.el6.x86_64 amd64
12:41:26.157 INFO - v2.44.0, with Core v2.44.0. Built from revision 76d78cf
12:41:26.492 INFO - Default driver org.openqa.selenium.ie.InternetExplorerDriver registration is skipped: registration capabilities Capabilities [{platform=WINDOWS, ensureCleanSession=true, browserName=internet explorer, version=}] does not match with current platform: LINUX
12:41:26.589 INFO - RemoteWebDriver instances should connect to: http://127.0.0.1:4444/wd/hub
12:41:26.589 INFO - Version Jetty/5.1.x
12:41:26.590 INFO - …Run Code Online (Sandbox Code Playgroud) 我想在RSelenium和页面中找到PDF文件的所有链接Xpath.
请考虑
require(RSelenium)
RSelenium::checkForServer()
RSelenium::startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate("https://cran.r-project.org/manuals.html")
Run Code Online (Sandbox Code Playgroud)
在页面中有多个指向PDF文件的链接,例如
<a href="doc/manuals/r-release/R-intro.pdf">PDF</a>
Run Code Online (Sandbox Code Playgroud)
但我第一次尝试
remDr$findElement(using = "xpath", "//a[contains(@href,'.pdf')/@href")
Run Code Online (Sandbox Code Playgroud)
产生以下错误
Error: Summary: InvalidSelector
Detail: Argument was an invalid selector (e.g. XPath/CSS).
class: org.openqa.selenium.InvalidSelectorException
Run Code Online (Sandbox Code Playgroud)
我的语法错了吗?
我在R中使用RSelenium软件包进行网络抓取。有时,加载网页后,需要检查对象是否在网页中可见。例如:
library(RSelenium)
#open a browser
RSelenium::startServer()
remDr <- remoteDriver$new()
remDr <- remoteDriver(remoteServerAddr = "localhost"
, port = 4444
, browserName = "firefox")
remDr$open()
remDr$navigate("https://www.google.com")
#xpath for Google logo
x_path="/html/body/div/div[5]/span/center/div[1]/img"
Run Code Online (Sandbox Code Playgroud)
我需要做这样的事情:
if (exist(remDr$findElement(using='xpath',x_path))){
print("Logo Exists")
}
Run Code Online (Sandbox Code Playgroud)
我的问题是应该为“ exist”使用什么功能?上面的代码不起作用,它只是一个伪代码。我还找到了一个代码,可用于使用对象的“ id”检查对象,这里是:
remDr$executeScript("return document.getElementById('hplogo').hidden;", args = list())
Run Code Online (Sandbox Code Playgroud)
上面的代码仅适用于“ id”,我应该如何使用“ xpath”执行相同的操作?谢谢
在 R selenium 中,可以像这样设置时间睡眠:
Sys.sleep(15)
Run Code Online (Sandbox Code Playgroud)
怎么可能设置随机时间?在Python中是这样的:
time.sleep(random.uniform(3.5,6.9))
Run Code Online (Sandbox Code Playgroud) 我整理了一个刮擦Expedia价格/航空公司的原油刮板:
# Start the Server
rD <- rsDriver(browser = "phantomjs", verbose = FALSE)
# Assign the client
remDr <- rD$client
# Establish a wait for an element
remDr$setImplicitWaitTimeout(1000)
# Navigate to Expedia.com
appurl <- "https://www.expedia.com/Flights-Search?flight-type=on&starDate=04/30/2017&mode=search&trip=oneway&leg1=from:Denver,+Colorado,to:Oslo,+Norway,departure:04/30/2017TANYT&passengers=children:0,adults:1"
remDr$navigate(appURL)
# Give a crawl delay to see if it gives time to load web page
Sys.sleep(10) # Been testing with 10
###ADD JAVASCRIPT INJECTION HERE###
remDr$executeScript(?)
# Extract Prices
webElem <- remDr$findElements(using = "css", "[class='dollars price-emphasis']")
prices <- unlist(lapply(webElem, function(x){x$getElementText()}))
print(prices)
# Extract Airlines …Run Code Online (Sandbox Code Playgroud) 我在docker中配置了selenium服务器.它工作正常 - 我可以连接到它,但当我想与运行本地闪亮的应用程序进行交互Rselenium没有看到它.详情如下:
我一步一步地做了:
我运行selenium服务器:
docker run -d -p 4445:4444 selenium/standalone-chrome
成功连接到selenium服务器:
remDr <- remoteDriver(remoteServerAddr = "localhost"
, port = 4445L
, browserName = "chrome"
, platform = "MAC")
> remDr$open()
[1] "Connecting to remote server"
> shiny::runApp(file.path(find.package("RSelenium"), "apps", "shinytestapp"), port = 6012)
Listening on http://127.0.0.1:6012
remDr$navigate("localhost:6012")
appTitle <- remDr$getTitle()[[1]]
expect_equal(appTitle, "Shiny Test App")
并得到错误:
Error: 'appTitle' not equal to "Shiny Test App".
1/1 mismatches
x[1]: "localhost"
y[1]: "Shiny Test App"
remDr$screenshot(display = TRUE)
它看起来像这样:
你知道为什么RSelenium看不到本地运行的闪亮app吗?
尝试将python项目(使用硒来刮除Twitter推文而不使用受限的 Twitter api)转换为R编程。在Python中可以正常工作,但是我想在R中重新创建它。R的新手,但如果有帮助,我有一些MatLab的经验
install.packages("RSelenium") # install RSelenium 1.7.1
Run Code Online (Sandbox Code Playgroud)
据我所知,软件包已更新。因此,我需要使用其他功能来代替startserver()。但是根据所有的研究,我得到了一些矛盾的答案,但都没有用:
require(RSelenium) #used require() and library()
remDr <- remoteDriver(browserName = "chrome")
remDr$open()
Run Code Online (Sandbox Code Playgroud)
我得到错误:
[1] "Connecting to remote server"
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused
Run Code Online (Sandbox Code Playgroud)
还尝试了:
require(RSelenium)
remDr <- rsDriver(browser = c("chrome"))
Run Code Online (Sandbox Code Playgroud)
我得到:
checking Selenium Server versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking chromedriver versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking geckodriver versions:
BEGIN: PREDOWNLOAD
BEGIN: …Run Code Online (Sandbox Code Playgroud) 我有一个简单的yaml文件:
seleniumhub:
image: selenium/hub
ports:
- 4444:4444
firefoxnode:
image: selenium/node-firefox-debug
ports:
- 4577
links:
- seleniumhub:hub
chromenode:
image: selenium/node-chrome-debug
ports:
- 4578
links:
- seleniumhub:hub
Run Code Online (Sandbox Code Playgroud)
我在docker中执行过:
docker-compose up -d
Run Code Online (Sandbox Code Playgroud)
我有一个集线器和两个节点在运行.
现在我想并行运行两个非常简单的selenium命令(用RSelenium编写):
remDr$open()
remDr$navigate("http://www.r-project.org")
remDr$screenshot(display = TRUE)
Run Code Online (Sandbox Code Playgroud)
我想知道如何在Python或R中并行运行selenium命令.我尝试了几种方法但没有效果.例如在R中:
library(RSelenium)
remDr <- remoteDriver(remoteServerAddr = "192.168.99.100", port = 4444L)
remDr$open()
remDr$navigate("http://www.r-project.org")
remDr$screenshot(display = TRUE)
Run Code Online (Sandbox Code Playgroud)
什么都不做 我也试过运行两个remoteDrivers,但这对以太没有帮助:
remDr <- remoteDriver(remoteServerAddr = "192.168.99.100", port = 4577L)
remDr$open()
remDr$navigate("http://www.r-project.org")
remDr$screenshot(display = TRUE)
Run Code Online (Sandbox Code Playgroud) 我正在尝试从Flipkart网站上抓取数据。该网页的链接如下:https : //www.flipkart.com/mi-a1-black-64-gb/product-reviews/itmexnsrtzhbbneg?aid=overall&pid=MOBEX9WXUSZVYHET
我需要通过单击网页的NEXT按钮来自动导航到NEXT页面。下面是我正在使用的代码
nextButton <-remDr$findElement(value ='//div[@class="_2kUstJ"]')$clickElement()
Run Code Online (Sandbox Code Playgroud)
错误
Selenium message:Element is not clickable at point
Run Code Online (Sandbox Code Playgroud)
我什至尝试使用以下代码按照许多stackoverflow问题的建议滚动网页
remDr$executeScript("arguments[0].scrollIntoView(true);", nextButton)
Run Code Online (Sandbox Code Playgroud)
但是这段代码也给出了错误
Error in checkError(res) : Undefined error in httr call. httr output: No method for S4 class:webElement
Run Code Online (Sandbox Code Playgroud)
请提出解决方案。我正在使用firefox browser并selenium自动使用R编程。
我遇到了这样一个问题:我有使用RSelenium导航页面,单击按钮并收集数据的Rscript。我将此脚本作为功能包含在我的ShinyApp中。当我从机器上运行它时,一切都按预期进行:单击按钮后,Firefox启动并正确运行。
当我想在我公司的Rstudio服务器上(在Linux下)发布脚本以便通过链接提供对工具(在Windows下编写)的访问时,发生了麻烦。运行此命令后:
rD<-rsDriver(port=4441L, browser="firefox", chromever=NULL, iedrver = NULL, phantomver = NULL)
Run Code Online (Sandbox Code Playgroud)
A收到以下输出:
$client
[1] "No sessionInfo. Client browser is mostly likely not opened."
$server
Process Handle
command : /tmp/RtmpElIBko/file3a0241d505d8.sh
system id : 15293
state : exited
Run Code Online (Sandbox Code Playgroud)
因此服务器正在运行,但是无法打开浏览器
服务器日志为:
$stderr
[1] "14:22:06.908 INFO [GridLauncherV3.launch] - Selenium build info: version: '3.12.0', revision: '7c6e0b3'"
[2] "14:22:06.910 INFO [GridLauncherV3$1.launch] - Launching a standalone Selenium Server on port 4441"
[3] "2018-05-15 14:22:07.026:INFO::main: Logging initialized @452ms to org.seleniumhq.jetty9.util.log.StdErrLog"
[4] "14:22:07.227 INFO [SeleniumServer.boot] - Selenium Server is …Run Code Online (Sandbox Code Playgroud)