如何删除特定标签但保留允许的标签

Question

如何删除特定标签但保留允许的标签

在某些 HTML 中，我想删除一些特定的标签，但保留标签的内容/HTML。例如，在下面的行中，我想删除<strong>和<div>列入黑名单的标签，但保留标签的内容，并保留白名单标签中的<p>,<img>和其他标签：

原来的：

<div>
    some text
    <strong>text</strong>
    <p>other text</p>
    <img src="http://example.com" />
</div>

Run Code Online (Sandbox Code Playgroud)

结果：

some text
text
<p>other text</p>
<img src="http://example.com" />

Run Code Online (Sandbox Code Playgroud)

我想剥离特定标签，某些标签不得剥离。它必须像strip_tags在 PHP 中一样工作。所以inner_html帮不了我。

Answer 1

the*_*Man 6

我会做这样的事情：

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<div>
    some text
    <strong>text</strong>
    <p>other text</p>
    <img src="http://example.com" />
</div>
EOT

BLACKLIST = %w[strong div]

doc.search(BLACKLIST.join(',')).each do |node|
  node.replace(node.children)
end

puts doc.to_html
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >>     some text
# >>     text
# >>     <p>other text</p>
# >>     <img src="http://example.com">
# >> 
# >> </body></html>

Run Code Online (Sandbox Code Playgroud)

基本上它查找节点BLACKLIST并在文档中的任何位置找到它们，用children节点的替换它们，有效地将子节点提升到它们的父节点。

归档时间：	10 年，4 月前
查看次数：	1284 次
最近记录：	10 年，4 月前