如何在nokogiri中使用SAX解析XML时搜索XML

ral*_*lph 4 ruby sax nokogiri

我有一个简单但巨大的xml文件,如下所示.我想使用SAX解析它,只在title标签之间打印文本.

<root>
    <site>some site</site>
    <title>good title</title>
</root>
Run Code Online (Sandbox Code Playgroud)

我有以下代码:

require 'rubygems'
require 'nokogiri'
include Nokogiri

class PostCallbacks < XML::SAX::Document
  def start_element(element, attributes)
    if element == 'title'
      puts "found title"
    end
  end

  def characters(text)
    puts text
  end
end

parser = XML::SAX::Parser.new(PostCallbacks.new)
parser.parse_file("myfile.xml")
Run Code Online (Sandbox Code Playgroud)

问题是它在所有标签之间打印文本.如何在title标签之间打印文字?

mu *_*ort 8

你只需要跟踪你何时进入,<title>以便characters知道什么时候应该注意.这样的东西(未经测试的代码)也许:

class PostCallbacks < XML::SAX::Document
  def initialize
    @in_title = false
  end

  def start_element(element, attributes)
    if element == 'title'
      puts "found title"
      @in_title = true
    end
  end

  def end_element(element)
    # Doesn't really matter what element we're closing unless there is nesting,
    # then you'd want "@in_title = false if element == 'title'"
    @in_title = false
  end

  def characters(text)
    puts text if @in_title
  end
end
Run Code Online (Sandbox Code Playgroud)