正则表达式删除条件注释

Question

正则表达式删除条件注释

我想要一个可以匹配HTML源页面中的条件注释的正则表达式,所以我只能删除那些.我想保留常规评论.

我也想避免使用.*？符号如果可能的话.

文字是

foo

<!--[if IE]>

<style type="text/css">

ul.menu ul li{
    font-size: 10px;
    font-weight:normal;
    padding-top:0px;
}

</style>

<![endif]-->

bar

Run Code Online (Sandbox Code Playgroud)

我想在去除一切

编辑:这是因为BeautifulSoup我想删除这些标签.BeautifulSoup无法解析并提供不完整的来源

EDIT2: [如果IE]不是唯一的条件.还有更多,我没有任何可能的组合列表.

EDIT3: Vinko Vrsalovic的解决方案有效,但是为什么beautifulsoup失败的实际问题是由于条件评论中的流氓评论.喜欢

<!--[if lt IE 7.]>
<script defer type="text/javascript" src="pngfix_253168.js"></script><!--png fix for IE-->
<![endif]-->

Run Code Online (Sandbox Code Playgroud)

请注意评论？

虽然我的问题已经解决了,但我希望得到一个正则表达式的解决方案.

Answer 1

Vin*_*vic 5

>>> from BeautifulSoup import BeautifulSoup, Comment
>>> html = '<html><!--[if IE]> bloo blee<![endif]--></html>'
>>> soup = BeautifulSoup(html)
>>> comments = soup.findAll(text=lambda text:isinstance(text, Comment) 
               and text.find('if') != -1) #This is one line, of course
>>> [comment.extract() for comment in comments]
[u'[if IE]> bloo blee<![endif]']
>>> print soup.prettify()
<html>
</html>
>>>

Run Code Online (Sandbox Code Playgroud)

python 3与bf4:

from bs4 import BeautifulSoup, Comment
html = '<html><!--[if IE]> bloo blee<![endif]--></html>'
soup = BeautifulSoup(html, "html.parser")
comments = soup.findAll(text=lambda text:isinstance(text, Comment) 
               and text.find('if') != -1) #This is one line, of course
[comment.extract() for comment in comments]
[u'[if IE]> bloo blee<![endif]']
print (soup.prettify())

Run Code Online (Sandbox Code Playgroud)

如果您的数据与BeautifulSoup混淆,您可以事先修复它或自定义解析器,以及其他解决方案.

编辑:根据您的评论,您只需根据需要修改传递给findAll的lambda(我修改了它)

Answer 2

Tho*_*ers 0

不要为此使用正则表达式。您会对包含开始标签和不包含开始标签的注释感到困惑，并做出错误的事情。HTML 不是正则的，尝试使用单个正则表达式修改它将会失败。

为此，请使用 HTML 解析器。BeautifulSoup 是一款优秀、简单、灵活且坚固的工具，能够处理现实世界（意味着完全损坏的）HTML。使用它，您可以查找所有注释节点，检查它们的内容（如果您愿意，可以使用正则表达式）并在需要删除它们时将其删除。

归档时间：	17 年，3 月前
查看次数：	4506 次
最近记录：	9 年前