使用C#解析HTML

3 html c# windows-phone html-agility-pack

我想用C#解析html页面.有些html页面包含很多html标签,以下是其中一个的示例:

<span class=text14 id="article_content"><!-- RELEVANTI_ARTICLE_START --><span ></b>The 
     most important component for <a
     class=bluelink href="http://www.ynetnews.com/articles/0,7340,L-
     3284752,00.html%20"' onmouseover='this.href=unescape(this.href)' 
     target=_blank>Israel</a>'s
     security is its special relations with the American administration, and especially with its generous purse. When the Netanyahu government launches a great outcry against the <a  ...
Run Code Online (Sandbox Code Playgroud)

但我只想把<span class=text14 id="article_content">标签包裹起来的内容.起初我曾考虑使用preg匹配,但后来意识到它根本没有效率.我后来读到的Html敏捷包FizzlerEx -我想知道是否有可能获得通过我一直在使用这些工具中提到的特定标签包装的文字,我会很感激,如果有人能告诉我怎么快速完成这项任务.

Sim*_*ead 5

使用Html Agility Pack非常简单:

var markup = @"<span class=text14 id=""article_content""><!-- RELEVANTI_ARTICLE_START --><span ></b>The most important component for <a class=bluelink href=""http://www.ynetnews.com/articles/0,7340,L-3284752,00.html%20""' onmouseover='this.href=unescape(this.href)' target=_blank>Israel</a>'s security is its special relations with the American administration, and especially with its generous purse. When the Netanyahu government launches a great outcry against the</span>";

var doc = new HtmlDocument();
doc.LoadHtml(markup);

var content = doc.GetElementbyId("article_content").InnerText;

Console.WriteLine(content);
Run Code Online (Sandbox Code Playgroud)