Ani*_*ari 11 ruby algorithm ruby-on-rails nokogiri ruby-on-rails-4
我要处理的深度嵌套ul,ol和li标签.我需要提供与浏览器中相同的视图.我想在pdf文件中实现以下示例:
text = "
<body>
<ol>
<li>One</li>
<li>Two
<ol>
<li>Inner One</li>
<li>inner Two
<ul>
<li>hey
<ol>
<li>hiiiiiiiii</li>
<li>why</li>
<li>hiiiiiiiii</li>
</ol>
</li>
<li>aniket </li>
</li>
</ul>
<li>sup </li>
<li>there </li>
</ol>
<li>hey </li>
<li>Three</li>
</li>
</ol>
<ol>
<li>Introduction</li>
<ol>
<li>Introduction</li>
</ol>
<li>Description</li>
<li>Observation</li>
<li>Results</li>
<li>Summary</li>
</ol>
<ul>
<li>Introduction</li>
<li>Description
<ul>
<li>Observation
<ul>
<li>Results
<ul>
<li>Summary</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>Overview</li>
</ul>
</body>"
Run Code Online (Sandbox Code Playgroud)
我必须用虾来完成我的任务.但是大虾不支持HTML标签.所以,我想出了一个解决方案nokogiri:.我正在解析,然后用gsub删除标签.我已经针对上述内容的一部分编写了以下解决方案,但问题是ul和ol可能会有所不同.
RULES = {
ol: {
1 => ->(index) { "#{index + 1}. " },
2 => ->(index) { "#{}" },
3 => ->(index) { "#{}" },
4 => ->(index) { "#{}" }
},
ul: {
1 => ->(_) { "\u2022 " },
2 => ->(_) { "" },
3 => ->(_) { "" },
4 => ->(_) { "" },
}
}
def ol_rule(group, deepness: 1)
group.search('> li').each_with_index do |item, i|
prefix = RULES[:ol][deepness].call(i)
item.prepend_child(prefix)
descend(item, deepness + 1)
end
end
def ul_rule(group, deepness: 1)
group.search('> li').each_with_index do |item, i|
prefix = RULES[:ul][deepness].call(i)
item.prepend_child(prefix)
descend(item, deepness + 1)
end
end
def descend(item, deepness)
item.search('> ol').each do |ol|
ol_rule(ol, deepness: deepness)
end
item.search('> ul').each do |ul|
ul_rule(ul, deepness: deepness)
end
end
doc = Nokogiri::HTML.fragment(text)
doc.search('ol').each do |group|
ol_rule(group, deepness: 1)
end
doc.search('ul').each do |group|
ul_rule(group, deepness: 1)
end
puts doc.inner_text
1. One
2. Two
1. Inner One
2. inner Two
• hey
1. hiiiiiiiii
2. why
3. hiiiiiiiii
• aniket
3. sup
4. there
3. hey
4. Three
1. Introduction
1. Introduction
2. Description
3. Observation
4. Results
5. Summary
• Introduction
• Description
• Observation
• Results
• Summary
• Overview
Run Code Online (Sandbox Code Playgroud)
问题
1)我想要实现的是如何在处理ul和ol标签时处理空间
2)当li进入ul或li进入ol内部时如何处理深度嵌套
我想出了一个解决方案,该解决方案使用每级可配置的计数规则来处理多个标识:
require 'nokogiri'
ROMANS = %w[i ii iii iv v vi vii viii ix]
RULES = {
ol: {
1 => ->(index) { "#{index + 1}. " },
2 => ->(index) { "#{('a'..'z').to_a[index]}. " },
3 => ->(index) { "#{ROMANS.to_a[index]}. " },
4 => ->(index) { "#{ROMANS.to_a[index].upcase}. " }
},
ul: {
1 => ->(_) { "\u2022 " },
2 => ->(_) { "\u25E6 " },
3 => ->(_) { "* " },
4 => ->(_) { "- " },
}
}
def ol_rule(group, deepness: 1)
group.search('> li').each_with_index do |item, i|
prefix = RULES[:ol][deepness].call(i)
item.prepend_child(prefix)
descend(item, deepness + 1)
end
end
def ul_rule(group, deepness: 1)
group.search('> li').each_with_index do |item, i|
prefix = RULES[:ul][deepness].call(i)
item.prepend_child(prefix)
descend(item, deepness + 1)
end
end
def descend(item, deepness)
item.search('> ol').each do |ol|
ol_rule(ol, deepness: deepness)
end
item.search('> ul').each do |ul|
ul_rule(ul, deepness: deepness)
end
end
doc = Nokogiri::HTML.fragment(text)
doc.search('ol:root').each do |group|
binding.pry
ol_rule(group, deepness: 1)
end
doc.search('ul:root').each do |group|
ul_rule(group, deepness: 1)
end
Run Code Online (Sandbox Code Playgroud)
然后,您可以根据环境删除标签或使用doc.inner_text。
但是有两个警告:
电流输出:
1. One
2. Two
a. Inner One
b. inner Two
? hey
? hey
3. hey
4. hey
hey
Three
1. Introduction
a. Introduction
2. Description
3. Observation
4. Results
5. Summary
• Introduction
• Description
? Observation
* Results
- Summary
• Overview
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
334 次 |
| 最近记录: |