需要仅使用Nokogiri从object/embed标签中删除换行符

mod*_*ron 3 ruby replace newline ruby-on-rails nokogiri

我需要从任何object/embed标签中删除换行符.我目前正尝试使用Nokogiri执行以下操作:

s = "<div>
<object height='450' width='600'>
<param name='allowfullscreen' value='true'>
<param name='allowscriptaccess' value='always'>
<param name='movie' value='http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1'>
<embed src='http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1' type='application/x-shockwave-flash' allowfullscreen='true' allowscriptaccess='always' height='450' width='600'>
</embed>
</object>
</div>"
doc = Nokogiri::HTML(s)
doc.css('object').each { |o| o.inner_html.gsub!(/\n/, ""); puts o.inner_html }
Run Code Online (Sandbox Code Playgroud)

请注意,该示例仅适用于对象标记.

在块结尾处打印o.inner_html表示即使gsub文本显示正确,也没有发生替换.此外,一旦解决了该部分,我需要确保doc对象中的实际对象节点与更新的值一起保存.

任何帮助都非常感谢.谢谢.

Phr*_*ogz 6

得到它了!

require 'nokogiri'
s = <<ENDHTML
<div>
<object height='450' width='600'>
  <param name='allowfullscreen' value='true'><param name='allowscriptaccess' value='always'>
  <param name='movie' value='http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1'>
<embed src='http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1' type='application/x-shockwave-flash' allowfullscreen='true' allowscriptaccess='always' height='450' width='600'>
</embed>
</object>
</div>
ENDHTML

doc = Nokogiri::HTML(s)
doc.css('object,embed').each{ |e| e.inner_html = e.inner_html.gsub(/\n/,'') }
puts doc.serialize( save_with: 0 )

#=> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
#=> <html><body><div>
#=> <object height="450" width="600"><param name="allowfullscreen" value="true"><param name="allowscriptaccess" value="always"><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1"><embed src="http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" height="450" width="600"></embed></object>
#=> </div></body></html>
Run Code Online (Sandbox Code Playgroud)
  1. 删除所有文本节点不会完全清除文档; 你需要使用inner_html.
  2. 打电话inner_html.gsub!不一样inner_html = inner_html.gsub.
  3. 如图所示,您需要使用传入serialize的哈希:save_with => 0来防止Nokogiri 在输出中的标记之间生成换行符.