the*_*s11 3 ruby ruby-on-rails nokogiri xml-parsing
有一些问题得到Nokogiri及其文档的正确设置有点粗略开始.
我正在尝试解析XML文件:http://www.kongregate.com/games_for_your_site.xml
它返回游戏内的多个游戏,每个游戏都有一个标题,desc等....
<gameset>
<game>
<id>160342</id>
<title>Tricky Rick</title>
<thumbnail>
http://cdn3.kongregate.com/game_icons/0042/7180/KONG_icon250x200_site.png?21656-op
</thumbnail>
<launch_date>2012-12-12</launch_date>
<category>Puzzle</category>
<flash_file>
http://external.kongregate-games.com/gamez/0016/0342/live/embeddable_160342.swf
</flash_file>
<width>640</width>
<height>480</height>
<url>
http://www.kongregate.com/games/tAMAS_Games/tricky-rick
</url>
<description>
Help Rick to collect all the stolen fuel to refuel his spaceship and fly away from the planet. Use hammer, bombs, jetpack and other useful stuff to solve puzzles!
</description>
<instructions>
WASD \ Arrow Keys – move; S \ Down Arrow – take\release an object; CNTRL – interaction with objects: throw, hammer strike, invisibility mode; SPACE – interaction with elevators and fuel stations; Esc \ P – pause;
</instructions>
<developer_name>tAMAS_Games</developer_name>
<gameplays>24999</gameplays>
<rating>3.43</rating>
</game>
<game>
<id>160758</id>
<title>Flying Cookie Quest</title>
<thumbnail>
http://cdn2.kongregate.com/game_icons/0042/8428/icon_cookiequest_kong_250x200_site.png?16578-op
</thumbnail>
<launch_date>2012-12-07</launch_date>
<category>Action</category>
<flash_file>
http://external.kongregate-games.com/gamez/0016/0758/live/embeddable_160758.swf
</flash_file>
<width>640</width>
<height>480</height>
<url>
http://www.kongregate.com/games/LongAnimals/flying-cookie-quest
</url>
<description>
Launch Rocket Panda into the land of Cookies. With the help of low-flying sharks, hang-gliding sheep and Rocket Badger, can you defeat the all powerful Biscuit Head? Defeat All enemies of cookies in this launcher game.
</description>
<instructions>Use the mouse button!</instructions>
<developer_name>LongAnimals</developer_name>
<gameplays>168672</gameplays>
<rating>3.67</rating>
</game>
Run Code Online (Sandbox Code Playgroud)
从文档中,我使用的是:
require 'nokogiri'
require 'open-uri'
url = "http://www.kongregate.com/games_for_your_site.xml"
xml = Nokogiri::XML(open(url))
xml.xpath("//game").each do |node|
puts node.xpath("//id")
puts node.xpath("//title")
puts node.xpath("//thumbnail")
puts node.xpath("//category")
puts node.xpath("//flash_file")
puts node.xpath("//width")
puts node.xpath("//height")
puts node.xpath("//description")
puts node.xpath("//instructions")
end
Run Code Online (Sandbox Code Playgroud)
但是,它只返回无穷无尽的数据,而不是一组.任何帮助都会有所帮助.
the*_*Man 20
以下是我重写代码的方法:
xml = Nokogiri::XML(open("http://www.kongregate.com/games_for_your_site.xml"))
xml.xpath("//game").each do |game|
%w[id title thumbnail category flash_file width height description instructions].each do |n|
puts game.at(n)
end
end
Run Code Online (Sandbox Code Playgroud)
您的代码中的问题是所有子标记都带有前缀//,在XPath-ese中,"从根节点开始向下搜索包含该文本的所有标记".因此,它不是仅在每个//game节点内搜索,而是在整个文档中搜索每个//game节点的每个列出的标签.
我建议在XPath上使用CSS访问器,因为它们更简单(通常)并且更容易阅读.所以,而不是xpath('//game')我使用search('game').(search将使用CSS或XPath访问器at.)
如果您想要标签中包含的文字,请更改puts game.at(n)为:
puts game.at(n).text
Run Code Online (Sandbox Code Playgroud)
为了使输出更有用,我会这样做:
require 'nokogiri'
require 'open-uri'
xml = Nokogiri::XML(open('http://www.kongregate.com/games_for_your_site.xml'))
games = xml.search('game').map do |game|
%w[
id title thumbnail category flash_file width height description instructions
].each_with_object({}) do |n, o|
o[n] = game.at(n).text
end
end
require 'awesome_print'
puts games.size
ap games.first
ap games.last
Run Code Online (Sandbox Code Playgroud)
结果如下:
395
{
"id" => "160342",
"title" => "Tricky Rick",
"thumbnail" => "http://cdn3.kongregate.com/game_icons/0042/7180/KONG_icon250x200_site.png?21656-op",
"category" => "Puzzle",
"flash_file" => "http://external.kongregate-games.com/gamez/0016/0342/live/embeddable_160342.swf",
"width" => "640",
"height" => "480",
"description" => "Help Rick to collect all the stolen fuel to refuel his spaceship and fly away from the planet. Use hammer, bombs, jetpack and other useful stuff to solve puzzles!\n",
"instructions" => "WASD \\ Arrow Keys – move;\nS \\ Down Arrow – take\\release an object;\nCNTRL – interaction with objects: throw, hammer strike, invisibility mode;\nSPACE – interaction with elevators and fuel stations;\nEsc \\ P – pause;\n"
}
{
"id" => "78",
"title" => "rotaZion",
"thumbnail" => "http://cdn2.kongregate.com/game_icons/0000/0115/pixtiz.rotazion_icon.jpg?8217-op",
"category" => "Action",
"flash_file" => "http://external.kongregate-games.com/gamez/0000/0078/live/embeddable_78.swf",
"width" => "350",
"height" => "350",
"description" => "In rotaZion, you play with a bubble bar that you can’t stop rotating !\nCollect the bubbles and try to avoid the mines !\nCollect the different bonus to protect your bubble bar, makes the mines go slower or destroy all the mines !\nTry to beat 100.000 points ;)\n",
"instructions" => "Move the bubble bar with the arrow keys !\nBubble = 500 Points !\nPixtiz sign = 5000 Points !\n"
}
Run Code Online (Sandbox Code Playgroud)