如何使用HTML Agility Pack编辑HTML片段

Joh*_*ohn 15 c# html-agility-pack

所以我有一个HTML代码片段,我想用C#修改.

<div>
This is a specialSearchWord that I want to link to
<img src="anImage.jpg" />
<a href="foo.htm">A hyperlink</a>
Some more text and that specialSearchWord again.
</div>
Run Code Online (Sandbox Code Playgroud)

我想把它转换成这个:

<div>
This is a <a class="special" href="http://mysite.com/search/specialSearchWord">specialSearchWord</a> that I want to link to
<img src="anImage.jpg" />
<a href="foo.htm">A hyperlink</a>
Some more text and that <a class="special" href="http://mysite.com/search/specialSearchWord">specialSearchWord</a> again.
</div>
Run Code Online (Sandbox Code Playgroud)

我将根据这里的许多建议使用HTML Agility Pack,但我不知道我要去哪里.特别是,

  1. 如何将部分片段加载为字符串,而不是完整的HTML文档?
  2. 怎么编辑?
  3. 然后如何返回已编辑对象的文本字符串?

Ale*_*lex 24

  1. 与完整的HTML文档相同.没关系.
  2. 在有2个选择:你可以编辑InnerHtml直接财产(或Text文本节点),或通过使用例如修改DOM树AppendChild,PrependChild等等.
  3. 您可以使用HtmlDocument.DocumentNode.OuterHtml财产或使用HtmlDocument.Save方法(我个人更喜欢第二种选择).

至于解析,我选择包含你的搜索词的文本节点div,然后使用string.Replace方法替换它:

var doc = new HtmlDocument();
doc.LoadHtml(html);
var textNodes = doc.DocumentNode.SelectNodes("/div/text()[contains(.,'specialSearchWord')]");
if (textNodes != null)
    foreach (HtmlTextNode node in textNodes)
        node.Text = node.Text.Replace("specialSearchWord", "<a class='special' href='http://mysite.com/search/specialSearchWord'>specialSearchWord</a>");
Run Code Online (Sandbox Code Playgroud)

并将结果保存为字符串:

string result = null;
using (StringWriter writer = new StringWriter())
{
    doc.Save(writer);
    result = writer.ToString();
}
Run Code Online (Sandbox Code Playgroud)