用rails和nokogiri解析html

wow*_*ewa 2 ruby ruby-on-rails nokogiri

我需要使用Rails和Nokogiri解析HTML.这是HTML:

<body>
  <div id="mama">
    <div class="test1">text</div>
    <div class="test2">text2</div>
  </div>
  <div id="mama">
    <div class="test1">text</div>
    <div class="test2">text2</div>
  </div>
  <div id="mama">
    <div class="test1">text</div>
    <div class="test2">text2</div>
  </div>
</body>
Run Code Online (Sandbox Code Playgroud)

我应该如何形成循环问题?我已经尝试了很多次,但仍然收到错误或结果不好......

doc.xpath('//div[@id='mama']/?or what?').each do |node|
  parse_file.puts text1 
  parse_file.puts text2
  parse_file.puts text1 
  parse_file.puts \n
end
Run Code Online (Sandbox Code Playgroud)

结果应该是这样的

text from first mama
text2 from first mama
text from first mama

text from second mama
and so on...
Run Code Online (Sandbox Code Playgroud)

Phr*_*ogz 5

首先,请注意您发布的HTML在语法上是无效的:拥有多个具有相同id属性值的元素是非法的.如果您可以控制HTML,则应该解决此问题.

但是,使用相同(无效)的HTML,Nokogiri仍然没有遇到麻烦:

require 'nokogiri'
doc = Nokogiri::HTML(my_html)

doc.css('#mama').each_with_index do |div,i|
  puts "#{div.at_css('.test1').text} from mama ##{i}"
  puts "#{div.at_css('.test2').text} from mama ##{i}"
end

#=> text from mama #0
#=> text2 from mama #0
#=> text from mama #1
#=> text2 from mama #1
#=> text from mama #2
#=> text2 from mama #2
Run Code Online (Sandbox Code Playgroud)

如果你想直接使用XPath(正如Nokogiri在CSS幕后做的那样)你会这样做:

doc.xpath("//div[@id='mama']").each_with_index do |div,i|
  puts "#{div.at_xpath("./*[@class='test1']").text} from mama ##{i}"
  puts "#{div.at_xpath("./*[@class='test2']").text} from mama ##{i}"
end
Run Code Online (Sandbox Code Playgroud)