如何让regex匹配多个脚本标签？

Question

如何让regex匹配多个脚本标签？

我正在尝试返回文本正文中任何标记的内容.我目前正在使用以下表达式,但它只捕获第一个标记的内容,并在此之后忽略其他任何标记.

这是html的示例:

    <script type="text/javascript">
        alert('1');
    </script>

    <div>Test</div>

    <script type="text/javascript">
        alert('2');
    </script>

Run Code Online (Sandbox Code Playgroud)

我的正则表达式如下:

//scripttext contains the sample
re = /<script\b[^>]*>([\s\S]*?)<\/script>/gm;
var scripts  = re.exec(scripttext);

Run Code Online (Sandbox Code Playgroud)

当我在IE6上运行它时,它返回2个匹配项.第一个包含完整标记,第二个包含警报('1').

当我在http://www.pagecolumn.com/tool/regtest.htm上运行它时,它给出了2个结果,每个结果只包含脚本标记.

Answer 1

kan*_*gax 35

这里的"问题"在于如何exec运作.它仅匹配第一次出现,但将当前索引(即插入位置)存储在lastIndex正则表达式的属性中.要获得所有匹配,只需将正则表达式应用于字符串,直到它无法匹配(这是一种非常常见的方式):

var scripttext = ' <script type="text/javascript">\nalert(\'1\');\n</script>\n\n<div>Test</div>\n\n<script type="text/javascript">\nalert(\'2\');\n</script>';

var re = /<script\b[^>]*>([\s\S]*?)<\/script>/gm;

var match;
while (match = re.exec(scripttext)) {
  // full match is in match[0], whereas captured groups are in ...[1], ...[2], etc.
  console.log(match[1]);
}

Run Code Online (Sandbox Code Playgroud)

Answer 2

Sva*_*nte 5

不要使用正则表达式来解析 HTML。HTML 不是常规语言。使用 DOM 的强大功能。这要容易得多，因为它是正确的工具。

var scripts = document.getElementsByTagName('script');

Run Code Online (Sandbox Code Playgroud)

归档时间：	16 年，1 月前
查看次数：	37353 次
最近记录：	8 年，2 月前