使用Nokogiri提取一些JSON

Sun*_*Sun 3 ruby open-uri nokogiri

require 'open-uri'
require 'json'
require 'nokogiri'

doc = Nokogiri::HTML(open("http://www.highcharts.com/demo/"))

puts doc
Run Code Online (Sandbox Code Playgroud)

但我希望能够从这个网页中提取json,使用正则表达式似乎不起作用,以及如何通过XPath提取JSON?

Rea*_*onk 5

require 'open-uri'
require 'json'
doc = JSON.parse(open("http://www.highcharts.com/demo/"))
Run Code Online (Sandbox Code Playgroud)

  • 顺便说一下,你需要'open-uri'和.read打开结果,因为它是一个StringIO对象.doc = JSON.parse(open('xxxxx').read)) (8认同)
  • 在这种情况下,问题是错误的,而不是答案. (2认同)

Phr*_*ogz 5

以下是从 URL 访问脚本标记(不引用外部文件)的方法:

\n\n
require \'open-uri\'\nrequire \'nokogiri\'\ndoc = Nokogiri.HTML(open(\'http://www.highcharts.com/demo/\'))\ninline_script = doc.xpath(\'//script[not(@src)]\')\ninline_script.each do |script|\n  puts "-"*50, script.text\nend\n
Run Code Online (Sandbox Code Playgroud)\n\n

现在您只需找到所需的脚本块并提取所需的数据(使用正则表达式)。如果没有更多细节,很难猜测您想要什么并且依赖什么。

\n\n

这是一个相当脆弱的正则表达式,可以找到我猜您正在寻找的内容:

\n\n
inline = doc.xpath(\'//script[not(@src)]\').map(&:text)\ndata   = inline.map{ |js| js[/new Highcharts\\.Chart\\((.+?\\})\\);/m,1] }.compact[0]\nputs data\n
Run Code Online (Sandbox Code Playgroud)\n\n

这是你得到的结果:

\n\n
{\n  chart: {\n    renderTo: \'container\',\n    defaultSeriesType: \'line\',\n    marginRight: 130,\n    marginBottom: 25\n  },\n  title: {\n    text: \'Monthly Average Temperature\',\n    x: -20 //center\n  },\n  subtitle: {\n    text: \'Source: WorldClimate.com\',\n    x: -20\n  },\n  xAxis: {\n    categories: [\'Jan\', \'Feb\', \'Mar\', \'Apr\', \'May\', \'Jun\', \n      \'Jul\', \'Aug\', \'Sep\', \'Oct\', \'Nov\', \'Dec\']\n  },\n  yAxis: {\n    title: {\n      text: \'Temperature (\xc2\xb0C)\'\n    },\n    plotLines: [{\n      value: 0,\n      width: 1,\n      color: \'#808080\'\n    }]\n  },\n  tooltip: {\n    formatter: function() {\n                return \'<b>\'+ this.series.name +\'</b><br/>\'+\n        this.x +\': \'+ this.y +\'\xc2\xb0C\';\n    }\n  },\n  legend: {\n    layout: \'vertical\',\n    align: \'right\',\n    verticalAlign: \'top\',\n    x: -10,\n    y: 100,\n    borderWidth: 0\n  },\n  series: [{\n    name: \'Tokyo\',\n    data: [7.0, 6.9, 9.5, 14.5, 18.2, 21.5, 25.2, 26.5, 23.3, 18.3, 13.9, 9.6]\n  }, {\n    name: \'New York\',\n    data: [-0.2, 0.8, 5.7, 11.3, 17.0, 22.0, 24.8, 24.1, 20.1, 14.1, 8.6, 2.5]\n  }, {\n    name: \'Berlin\',\n    data: [-0.9, 0.6, 3.5, 8.4, 13.5, 17.0, 18.6, 17.9, 14.3, 9.0, 3.9, 1.0]\n  }, {\n    name: \'London\',\n    data: [3.9, 4.2, 5.7, 8.5, 11.9, 15.2, 17.0, 16.6, 14.2, 10.3, 6.6, 4.8]\n  }]\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

请注意,这不是JSON;这是一个表示 JavaScript 代码的字符串,包含对象、字符串、数组、数字和函数文字。

\n