HtmlAgilityPack 仅选择内部文本节点

Question

HtmlAgilityPack 仅选择内部文本节点

这是我的较大 html 文件的示例 html 输入部分。

string html = "<html> <p>Ingredients:</p> </html>";

Run Code Online (Sandbox Code Playgroud)

我只想检索具有内部文本Ingredients 的节点。成分可能来自 html 节点 p、div、strong 等。

我使用 HtmlAgility pack 和 linq 实现此目的的 C# 代码是

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

List<HtmlNode> ingredientList = doc.DocumentNode.Descendants().Where
                        (x => x.InnerText.Contains("Ingredients:")).ToList();

Run Code Online (Sandbox Code Playgroud)

这段代码的结果给了我 3 个节点

<html> node
<p> node
#text node

Run Code Online (Sandbox Code Playgroud)

我只想检索

<p> node

Run Code Online (Sandbox Code Playgroud)

Answer 1

har*_*r07 6

如果您的平台支持XPath，即HtmlAgilityPack的SelectNodes()方法可用，您可以使用XPath表达式来获取其直接子文本节点之一包含关键字的元素：

List<HtmlNode> ingredientList = doc.DocumentNode
                                   .SelectNodes("//*[text()[contains(.,'Ingredients:')]]")
                                   .ToList();

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，8 月前
查看次数：	2074 次
最近记录：	9 年，8 月前