Iev*_*ida 8 .net html c# regex
我有html作为javascript和css代码块的字符串.
像这样的东西:
<script type="text/javascript">
alert('hello world');
</script>
<style type="text/css">
A:link {text-decoration: none}
A:visited {text-decoration: none}
A:active {text-decoration: none}
A:hover {text-decoration: underline; color: red;}
</style>
Run Code Online (Sandbox Code Playgroud)
但我不需要它们.如何用reqular表达式删除那些块?
Eli*_*ing 16
快速'n'脏方法将是这样的正则表达式:
var regex = new Regex(
"(\\<script(.+?)\\</script\\>)|(\\<style(.+?)\\</style\\>)",
RegexOptions.Singleline | RegexOptions.IgnoreCase
);
string ouput = regex.Replace(input, "");
Run Code Online (Sandbox Code Playgroud)
更好的*(但可能更慢)选项是使用HtmlAgilityPack:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(htmlInput);
var nodes = doc.DocumentNode.SelectNodes("//script|//style");
foreach (var node in nodes)
node.ParentNode.RemoveChild(node);
string htmlOutput = doc.DocumentNode.OuterHtml;
Run Code Online (Sandbox Code Playgroud)
*)有关为何更好的讨论,请参阅此主题.