使用inspect元素的RSelenium和findElements

h.l*_*l.m 6 r web-scraping

我希望得到一些帮助,试图将以下网站中的每一节圣经章节作为数据帧中的一行字符串.

我正在努力寻找正确的元素/不知道如何将findElements()与浏览器中的inspect元素结合使用.任何关于如何通常对其他位执行此操作的指示,例如交叉引用/脚注都会很棒...(注意通过单击页面顶部附近的齿轮来调整"页面选项"可以看到交叉引用

以下是我尝试过的代码.

chapter.url <- "https://www.biblegateway.com/passage/?search=Genesis+50&version=ESV"
library(RSelenium)
RSelenium:::startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate(chapter.url)
webElem <- remDr$findElements('id','passage-text')
Run Code Online (Sandbox Code Playgroud)

jdh*_*son 8

通常我会针对相关的HTML.使用firefox firebug或类似的东西检查页面,我们看到:

在此输入图像描述

相关的HTML代码段是<div class="version-ESV result-text-style-normal text-html ">.所以我们可以找到带有类的元素version-ESV:

chapter.url <- "https://www.biblegateway.com/passage/?search=Genesis+50&version=ESV"
library(RSelenium)
RSelenium:::startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate(chapter.url)
webElem <- remDr$findElement('class', 'version-ESV')
webElem$highlightElement() # check visually we have the right element
Run Code Online (Sandbox Code Playgroud)

highlightElement方法为我们提供了视觉确认,即我们拥有所需的HTML块.最后,我们可以使用以下getElementAttribute方法获取HTML片段:

appData <- webElem$getElementAttribute("outerHTML")[[1]]
Run Code Online (Sandbox Code Playgroud)

然后可以使用XML包解析此HTML的经文.

更新:

包含在各种经文spanid它前面带有"EN-ESV-"我们可以利用这个目标'//span[contains(@id,"en-ESV-")]为XPATH.但是在这些代码块中,我们只希望子节点是文本节点.一旦找到这些文本节点,我们希望将它们粘贴在一起,用空格分隔:

appXPATH <- '//span[contains(@id,"en-ESV-")]'
appFunc <- function(x){
  appChildren <- xmlChildren(x)
  out <- appChildren[names(appChildren) == "text"]
  paste(sapply(out, xmlValue), collapse = ' ')
}
doc <- htmlParse(appData, encoding = 'UTF8') # specify encoding
results <- xpathSApply(doc, appXPATH, appFunc)
Run Code Online (Sandbox Code Playgroud)

结果如下:

> head(results)
[1] "Then Joseph  fell on his father's face and wept over him and kissed him."                                                                                                                                                   
[2] "And Joseph commanded his servants the physicians to  embalm his father. So the physicians embalmed Israel."                                                                                                                 
[3] "Forty days were required for it, for that is how many are required for embalming. And the Egyptians  wept for him seventy days."                                                                                            
[4] "And when the days of weeping for him were past, Joseph spoke to the household of Pharaoh, saying,  “If now I have found favor in your eyes, please speak in the ears of Pharaoh, saying,"                                   
[5] "‘My father made me swear, saying, “I am about to die: in my tomb  that I hewed out for myself in the land of Canaan, there shall you bury me.” Now therefore, let me please go up and bury my father. Then I will return.’”"
[6] "And Pharaoh answered, “Go up, and bury your father, as he made you swear.”"                                                                                    
Run Code Online (Sandbox Code Playgroud)