如何轻松解析具有此结构的文档
description
some line of text
another line of text
more lines of text
quality
3 47 88 4 4 4 4
text: type 1
stats some funny stats
description
some line of text2
another line of text2
more lines of text2
quality
1 2 4 6 7
text: type 1
stats some funny stats
.
.
.
Run Code Online (Sandbox Code Playgroud)
理想情况下,我想要一个哈希结构数组,其中每个哈希表示文档的"部分",可能应该如下所示:
{:description =>"某行文字另一行文字更多行文字",:quality =>"3 47 88 4 4 4 4",:text => type 1,:stats =>"some funny stats" }
您应该在循环中查找指标行(描述,质量,文本和统计信息)并在逐行处理文档时填充哈希值.
另一个选择是使用正则表达式并立即解析文档,但是你不需要在这里使用正则表达式,如果你不熟悉它们,我必须建议不要使用正则表达式.
更新:
sections = []
File.open("deneme") do |f|
current = {:description => "", :text => "", :quality => "", :stats => ""}
inDescription = false
inQuality = false
f.each_line do |line|
if inDescription
if line.strip == ""
inDescription = false
else
current[:description] += line
end
elsif inQuality
current[:quality] = line.strip
inQuality = false
elsif line.strip == "description"
inDescription = true
elsif line.strip == "quality"
inQuality = true
elsif line.match(/^text: /)
current[:text] = line[6..-1].strip
elsif line.match(/^stats /)
current[:stats] = line[6..-1].strip
sections.push(current)
current = {:description => "", :text => "", :quality => "", :stats => ""}
end
end
end
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
7353 次 |
| 最近记录: |