Chrome✗与Firefox中ES6/Unicode正则表达式中的逻辑OR序列✓

Question

Chrome✗与Firefox中ES6/Unicode正则表达式中的逻辑OR序列✓

Ahm*_*sih 13 javascript regex unicode google-chrome node.js

考虑以下Unicode重的正则表达式(表示非ASCII和额外BMP字符的表情符号):

''.match(/||/ug)

Run Code Online (Sandbox Code Playgroud)

Firefox返回[ "", "", "", "", "", "" ].

Chrome 52.0.2743.116和Node 6.4.0都返回了null!它似乎并不关心我是否将字符串放在一个变量中str.match(…),并且我是否通过构建一个RegExp对象new RegExp('||', 'gu').

(只需对两个序列进行OR运算,Chrome就可以了:没问题''.match(/|/ug).非Unicode也'aakkzzkkaa'.match(/aa|kk|zz/ug)可以.可以.)

难道我做错了什么？这是Chrome错误吗？在ECMAScript的兼容性表说我应该是确定使用Unicode正则表达式.

(PS:这个例子中使用的三个表情符号只是替身.在我的应用程序中,它们将是任意但不同的字符串.但我想知道''.match(/[]/ug)在Chrome 中有效的事实是否相关？)

2017年 4月12日在Chromium和下游(包括Chrome和Node)修复了更新标记.

Answer 1

geo*_*org 3

如果没有该u标志，您的正则表达式可以工作，这并不奇怪，因为在 BMP（=无“u”）模式下，它将 16 位“单位”与 16 位“单位”进行比较，即代理对与另一个代理对代理对。

“u”模式下的行为（应该比较代码点而不是单位）看起来确实像一个 Chrome bug，同时你可以将每个替代方案包含在一个组中，这似乎工作正常：

m = ''.match(/()|()|()/ug)
console.log(m)

// note that the groups must be capturing!
// this doesn't work:

m = ''.match(/(?:)|(?:)|(?:)/ug)
console.log(m)

Run Code Online (Sandbox Code Playgroud)

这里有一个快速证明，表明超过两个 SMP 替代方案在该u模式下被破坏：

// insert a whatever range 
// from https://en.wikipedia.org/wiki/Plane_(Unicode)#Supplementary_Multilingual_Plane
var range = '11300-1137F';

range = range.split('-').map(x => parseInt(x, 16))

var chars = [];
for (var i = range[0]; i <= range[1]; i++) {
    chars.push(String.fromCodePoint(i))
}

var str = chars.join('');

while(chars.length) {
    var re = new RegExp(chars.join('|'), 'u')
    if(str.match(re))
        console.log(chars.length, re);
    chars.pop();
}

Run Code Online (Sandbox Code Playgroud)

在 Chrome 中，它仅记录最后两个正则表达式（2 和 1 alts）。

归档时间：	9 年前
查看次数：	226 次
最近记录：	6 年，11 月前