使用XPath获取带有链接的段落文本

Question

我正在使用XPath解析HTML页面,并希望获取某些特定段落的全文,包括链接文本.

例如,我有以下段落:

<p class="main-content">
    This is sample paragraph with <a href="http://google.com">link</a> inside.
</p>

我需要得到以下文字作为结果:"这是带有链接的示例段落",但是应用"//p[@class'main-content']/text()"只给我"这是带内部的示例段落".

你能帮忙吗？谢谢.

Answer 1

要获取节点的全文内容,请使用以下string函数:

string(//p[@class="main-content"])

请注意,这会获得一个字符串值.如果您想要文本节点(由返回text()),您可以执行此操作.您需要搜索所有深度:

//p[@class="main-content"]//text()

这将返回三个文本节点:This is sample paragraph with,link和inside.