如何使用.NET中的Regex在2个标记之间提取字符串？

Question

如何使用.NET中的Regex在2个标记之间提取字符串？

Ang*_*ker 0 .net c# regex string-parsing

我有一个网页的来源,我需要提取身体.所以之间的任何</head><body>和</body></html>.

我试过以下但没有成功:

var match = Regex.Match(output, @"(?<=\</head\>\<body\>)(.*?)(?=\</body\>\</html\>)");

Run Code Online (Sandbox Code Playgroud)

它找到一个字符串,但很久就将其删除</body></html>.我根据RegEx 备忘单逃脱了角色.

我错过了什么？

Answer 1

Bro*_*ass 6

我建议使用HtmlAgilityPack - 用正则表达式解析HTML非常非常脆弱.

最新版本甚至支持Linq,因此您可以获得如下内容:

HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load("http://stackoverflow.com");
string html = doc.DocumentNode.Descendants("body").Single().InnerHtml;

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，4 月前
查看次数：	833 次
最近记录：	8 年，2 月前