如何从<span>和</ span>之间的html中检索数据

cli*_*kpn 1 html xpath r

我想在亚马逊的客户评论中得到1到5的房价.我查看了源代码,发现这部分看起来像

<div style="margin-bottom:0.5em;">
    <span style="margin-right:5px;"><span class="swSprite s_star_5_0 " title="5.0 out of 5 stars" ><span>5.0 out of 5 stars</span></span> </span>
    <span style="vertical-align:middle;"><b>Works great right out of the box with Surface Pro</b>, <nobr>October 5, 2013</nobr></span>
  </div>
Run Code Online (Sandbox Code Playgroud)

我希望从5星中获得5.0分

<span>5.0 out of 5 stars</span></span> </span>
Run Code Online (Sandbox Code Playgroud)

我怎样才能使用xpathSApply来获取它?

谢谢!

Ram*_*ath 7

我建议使用selectr包,它使用css选择器代替xpath.

library(XML)
doc <- htmlParse('
  <div style="margin-bottom:0.5em;">
    <span style="margin-right:5px;">
     <span class="swSprite s_star_5_0 " title="5.0 out of 5 stars" >
      <span>5.0 out of 5 stars</span></span> </span>
     <span style="vertical-align:middle;">
     <b>Works great right out of the box with Surface Pro</b>, 
     <nobr>October 5, 2013</nobr></span>
  </div>', asText = TRUE
)

library(selectr)
xmlValue(querySelector(doc, 'div > span > span > span'))
Run Code Online (Sandbox Code Playgroud)

更新:如果你想使用xpath,你可以使用css_to_xpath函数selectr来找出适当的xpath命令,在这种情况下,结果证明是

"descendant-or-self::div/span/span/span"
Run Code Online (Sandbox Code Playgroud)