R解析HTML文档并使用xpath获取两种模式的所有匹配

Question

R解析HTML文档并使用xpath获取两种模式的所有匹配

所以,我从FIFA worldcup网站解析HTML代码,并希望获得所有匹配:

 wcup <- htmlTreeParse("http://www.fifa.com/worldcup/matches/", useInternalNodes=T)

Run Code Online (Sandbox Code Playgroud)

但是,一个国家的领域是't-nText kern',其他国家的领域是't-nText'.

 <span class="t-nText kern">Bosnia and Herzegovina</span>

Run Code Online (Sandbox Code Playgroud)

因此,如果我使用此命令,我将错过'波斯尼亚和黑塞哥维那',就像这个命令:

xpathSApply(wcup, "//span[@class='t-nText ']", xmlValue)

Run Code Online (Sandbox Code Playgroud)

那么,有什么方法可以同时搜索属性't-nText'和't-nText kern'？或者你还有其他解决方案吗？我希望保持匹配的顺序.

xpath不支持逻辑OR:

xpathSApply(wcup, "//span[@class='t-nText ' || 't-nText kern']", xmlValue)
XPath error : Invalid expression
//span[@class='t-nText ' || 't-nText kern']
                          ^
XPath error : Invalid expression
//span[@class='t-nText ' || 't-nText kern']
                                          ^
Error in xpathApply.XMLInternalDocument(doc, path, fun, ..., namespaces = namespaces,  : 
  error evaluating xpath expression //span[@class='t-nText ' || 't-nText kern']

Run Code Online (Sandbox Code Playgroud)

Answer 1

Mar*_*gan 4

使用“or”或“starts-with()”，

wcup["//span[@class='t-nText kern' or @class='t-nText ']"]
wcup["//span[starts-with(@class, 't-nText ')]"]

Run Code Online (Sandbox Code Playgroud)

归档时间：	12 年，1 月前
查看次数：	763 次
最近记录：	12 年，1 月前