HTMLagilitypack没有删除所有的html标签如何有效地解决这个问题?

Obs*_*vus 12 c# string html-agility-pack

我使用以下方法从字符串中删除所有html:

public static string StripHtmlTags(string html)
        {
            if (String.IsNullOrEmpty(html)) return "";
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
            doc.LoadHtml(html);
            return doc.DocumentNode.InnerText;
        }
Run Code Online (Sandbox Code Playgroud)

但它似乎忽略了以下标记: […]

所以字符串基本返回:

> A hungry thief who stole a rack of pork ribs from a grocery store has
> been sentenced to spend 50 years in prison. Willie Smith Ward felt the
> full force of the law after being convicted of the crime in Waco,
> Texas, on Wednesday. The 43-year-old may feel slightly aggrieved over
> the severity of the […]
Run Code Online (Sandbox Code Playgroud)

如何确保剥离这些标签?

任何形式的帮助表示赞赏,谢谢.

Dam*_*ith 39

尝试 HttpUtility.HtmlDecode

public static string StripHtmlTags(string html)
{
    if (String.IsNullOrEmpty(html)) return "";
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(html);
    return HttpUtility.HtmlDecode(doc.DocumentNode.InnerText);
}
Run Code Online (Sandbox Code Playgroud)

HtmlDecode将转换[…][…]

  • 如果您使用的是.NET 4+,我建议使用WebUtility.HtmlDecode而不是HttpUtility.HtmlDecode.它不需要System.Web参考. (3认同)