在Ruby中解析结构化文本文件

eas*_*fri 2 ruby parsing

如何轻松解析具有此结构的文档

description
some line of text
another line of text
more lines of text

quality
3 47 88 4 4 4  4

text: type 1
stats some funny stats

description
some line of text2
another line of text2
more lines of text2

quality
1 2  4 6 7

text: type 1
stats some funny stats

.
.
.
Run Code Online (Sandbox Code Playgroud)

理想情况下,我想要一个哈希结构数组,其中每个哈希表示文档的"部分",可能应该如下所示:

{:description =>"某行文字另一行文字更多行文字",:quality =>"3 47 88 4 4 4 4",:text => type 1,:stats =>"some funny stats" }

Can*_*der 7

您应该在循环中查找指标行(描述,质量,文本和统计信息)并在逐行处理文档时填充哈希值.

另一个选择是使用正则表达式并立即解析文档,但是你不需要在这里使用正则表达式,如果你不熟悉它们,我必须建议不要使用正则表达式.

更新:

sections = []

File.open("deneme") do |f|
  current = {:description => "", :text => "", :quality => "", :stats => ""}
  inDescription = false
  inQuality = false

  f.each_line do |line|
    if inDescription
      if line.strip == ""
        inDescription = false
      else
        current[:description] += line
      end
    elsif inQuality
      current[:quality] = line.strip
      inQuality = false
    elsif line.strip == "description"
      inDescription = true
    elsif line.strip == "quality"
      inQuality = true
    elsif line.match(/^text: /)
      current[:text] = line[6..-1].strip
    elsif line.match(/^stats /)
      current[:stats] = line[6..-1].strip
      sections.push(current)
      current = {:description => "", :text => "", :quality => "", :stats => ""}
    end
  end
end
Run Code Online (Sandbox Code Playgroud)