Sun*_*Sun 3 ruby open-uri nokogiri
require 'open-uri'
require 'json'
require 'nokogiri'
doc = Nokogiri::HTML(open("http://www.highcharts.com/demo/"))
puts doc
Run Code Online (Sandbox Code Playgroud)
但我希望能够从这个网页中提取json,使用正则表达式似乎不起作用,以及如何通过XPath提取JSON?
require 'open-uri'
require 'json'
doc = JSON.parse(open("http://www.highcharts.com/demo/"))
Run Code Online (Sandbox Code Playgroud)
以下是从 URL 访问脚本标记(不引用外部文件)的方法:
\n\nrequire \'open-uri\'\nrequire \'nokogiri\'\ndoc = Nokogiri.HTML(open(\'http://www.highcharts.com/demo/\'))\ninline_script = doc.xpath(\'//script[not(@src)]\')\ninline_script.each do |script|\n puts "-"*50, script.text\nend\nRun Code Online (Sandbox Code Playgroud)\n\n现在您只需找到所需的脚本块并提取所需的数据(使用正则表达式)。如果没有更多细节,很难猜测您想要什么并且依赖什么。
\n\n这是一个相当脆弱的正则表达式,可以找到我猜您正在寻找的内容:
\n\ninline = doc.xpath(\'//script[not(@src)]\').map(&:text)\ndata = inline.map{ |js| js[/new Highcharts\\.Chart\\((.+?\\})\\);/m,1] }.compact[0]\nputs data\nRun Code Online (Sandbox Code Playgroud)\n\n这是你得到的结果:
\n\n{\n chart: {\n renderTo: \'container\',\n defaultSeriesType: \'line\',\n marginRight: 130,\n marginBottom: 25\n },\n title: {\n text: \'Monthly Average Temperature\',\n x: -20 //center\n },\n subtitle: {\n text: \'Source: WorldClimate.com\',\n x: -20\n },\n xAxis: {\n categories: [\'Jan\', \'Feb\', \'Mar\', \'Apr\', \'May\', \'Jun\', \n \'Jul\', \'Aug\', \'Sep\', \'Oct\', \'Nov\', \'Dec\']\n },\n yAxis: {\n title: {\n text: \'Temperature (\xc2\xb0C)\'\n },\n plotLines: [{\n value: 0,\n width: 1,\n color: \'#808080\'\n }]\n },\n tooltip: {\n formatter: function() {\n return \'<b>\'+ this.series.name +\'</b><br/>\'+\n this.x +\': \'+ this.y +\'\xc2\xb0C\';\n }\n },\n legend: {\n layout: \'vertical\',\n align: \'right\',\n verticalAlign: \'top\',\n x: -10,\n y: 100,\n borderWidth: 0\n },\n series: [{\n name: \'Tokyo\',\n data: [7.0, 6.9, 9.5, 14.5, 18.2, 21.5, 25.2, 26.5, 23.3, 18.3, 13.9, 9.6]\n }, {\n name: \'New York\',\n data: [-0.2, 0.8, 5.7, 11.3, 17.0, 22.0, 24.8, 24.1, 20.1, 14.1, 8.6, 2.5]\n }, {\n name: \'Berlin\',\n data: [-0.9, 0.6, 3.5, 8.4, 13.5, 17.0, 18.6, 17.9, 14.3, 9.0, 3.9, 1.0]\n }, {\n name: \'London\',\n data: [3.9, 4.2, 5.7, 8.5, 11.9, 15.2, 17.0, 16.6, 14.2, 10.3, 6.6, 4.8]\n }]\n}\nRun Code Online (Sandbox Code Playgroud)\n\n请注意,这不是JSON;这是一个表示 JavaScript 代码的字符串,包含对象、字符串、数组、数字和函数文字。
\n| 归档时间: |
|
| 查看次数: |
9010 次 |
| 最近记录: |