在R中使用br标记后提取文本的XPath

Question

在R中使用br标记后提取文本的XPath

如何br在以下行中的标记后提取文本:

<div id='population'>
    The Snow Leopard Survival Strategy (McCarthy <em>et al.</em> 2003, Table
    II) compiled national snow leopard population estimates, updating the work
    of Fox (1994). Many of the estimates are acknowledged to be rough and out
    of date, but the total estimated population is 4,080-6,590, as follows:<br>
    <br>
    Afghanistan: 100-200?<br>
    Bhutan: 100-200?<br>
    China: 2,000-2,500<br>
    India: 200-600<br>
    Kazakhstan: 180-200<br>
    Kyrgyzstan: 150-500<br>
    Mongolia: 500-1,000<br>
    Nepal: 300-500<br>
    Pakistan: 200-420<br>
    Russia: 150-200<br>
    Tajikistan: 180-220<br>
    Uzbekistan: 20-50
</div>

Run Code Online (Sandbox Code Playgroud)

我得到了:

xpathSApply(h, '//div[@id="population"]', xmlValue)

Run Code Online (Sandbox Code Playgroud)

但我现在被困住了......

Answer 1

Wri*_*ken 26

如果你意识到文本也是一个节点,它会有所帮助.div中的所有文本<br/>都可以通过以下方式检索:

//div[@id="population"]/text()[preceding-sibling::br]

Run Code Online (Sandbox Code Playgroud)

从技术上讲,标签之间的 <br/>意思是:

//div[@id="population"]/text()[preceding-sibling::br and following-sibling::br]

Run Code Online (Sandbox Code Playgroud)

...但我想这不是你想要的.

归档时间：	13 年，10 月前
查看次数：	14397 次
最近记录：	9 年前