带标签python

Question

带标签python

我想要以下功能.

input : this is test <b> bold text </b> normal text
expected output: this is test normal text

Run Code Online (Sandbox Code Playgroud)

即删除指定标记的内容

Answer 1

zol*_*i2k 9

解决方案BeautifulSoup:

from BeautifulSoup import BeautifulSoup
def removeTag(soup, tagname):
    for tag in soup.findAll(tagname):
        contents = tag.contents
        parent = tag.parent
        tag.extract()

s = BeautifulSoup("abcd <b> btag </b> hello <d>dtag</d>")

removeTag(s,"b")
print s
removeTag(s, "d")
print s

Run Code Online (Sandbox Code Playgroud)

收益:

>>>
abcd  hello <d>dtag</d>
abcd  hello

Run Code Online (Sandbox Code Playgroud)

Answer 2

Bri*_*ian 5

使用BeautifulSoup:

from BeautifulSoup import BeautifulSoup    
''.join(BeautifulSoup(page).findAll(text=True))

Run Code Online (Sandbox Code Playgroud)

见http://www.ghastlyfop.com/blog/2008/12/strip-html-tags-from-string-python.html

Answer 3

Sam*_*Sam 5

如果您不介意Python（尽管正则表达式相当通用），则可以从Django的strip_tags过滤器中获得一些启发。

为了完整起见，此处转载-

def strip_tags(value):
    """Returns the given HTML with all tags stripped."""
    return re.sub(r'<[^>]*?>', '', force_unicode(value))

Run Code Online (Sandbox Code Playgroud)

编辑：如果您正在使用此解决方案或任何其他正则表达式解决方案，请记住，它可以通过精心设计的HTML（请参阅注释）以及HTML注释，因此不应与不受信任的输入一起使用。考虑使用某些beautifulsoup，html5lib或lxml答案代替不可信的输入。

归档时间：	16 年，2 月前
查看次数：	8373 次
最近记录：	11 年，6 月前