从 html 中提取 js 和 css 文件的网址？（使用 node.js）

Question

从 html 中提取 js 和 css 文件的网址？（使用 node.js）

我想要一个来自 html 字符串的 url 数组，尽管只来自以下标签：

链接 href="http://example.com/foo.css"
脚本 src="http://example.com/foo.js"

我想要这些 url，以便我可以将它们放入 appcache 清单文件中。我使用 appcache manifest builder，但它只分析我在本地提供的静态文件。它运行良好，但它不会自动包含我在 html 中包含的外部静态 js/css 文件。

我希望能够使用 node.js 解析 html 字符串。

Answer 1

vic*_*ohl 6

您可以使用cheerio。它是用于节点的核心 jQuery 的实现。

例如：

var cheerio = require('cheerio'),
    request = require('request');

request('http://www.stackoverflow.com', function (error, response, body) {
  if (!error && response.statusCode == 200) {
    var $ = cheerio.load(body);

    var linkHrefs = $('link').map(function(i) {
      return $(this).attr('href');
    }).get();
    var scriptSrcs = $('script').map(function(i) {
      return $(this).attr('src');
    }).get();


    console.log("links:");
    console.log(linkHrefs);
    console.log("scripts:");
    console.log(scriptSrcs);
  }
});

Run Code Online (Sandbox Code Playgroud)

输出：

Victors-MacBook-Pro:a kohl$ node test.js 
links:
[ '//cdn.sstatic.net/stackoverflow/img/favicon.ico?v=6cd6089ee7f6',
  '//cdn.sstatic.net/stackoverflow/img/apple-touch-icon.png?v=41f6e13ade69',
  '/opensearch.xml',
  '//cdn.sstatic.net/stackoverflow/all.css?v=317033db9646',
  '/feeds' ]
scripts:
[ '//ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js',
  '//cdn.sstatic.net/Js/stub.en.js?v=e3a448574e16' ]

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，10 月前
查看次数：	2161 次
最近记录：	10 年，10 月前