为什么 Unicode 表情符号属性转义匹配数字?

Nin*_*liu 9 javascript regex emoji

我发现了这种使用Unicode 属性转义使用不使用“巨大魔法范围”的正则表达式来检测表情符号的很棒的方法:

console.log(/\p{Emoji}/u.test('flowers ')) // true
console.log(/\p{Emoji}/u.test('flowers')) // false
Run Code Online (Sandbox Code Playgroud)

但是当我在这个答案中分享这些知识,@Bronzdragon 注意到它\p{Emoji}也匹配数字!这是为什么?数字不是表情符号?

console.log(/\p{Emoji}/u.test('flowers 123')) // unexpectdly true

// regex-only workaround by @Bonzdragon
const regex = /(?=\p{Emoji})(?!\p{Number})/u;
console.log(
  regex.test('flowers'), // false, as expected
  regex.test('flowers 123'), // false, as expected
  regex.test('flowers 123 '), // true, as expected
  regex.test('flowers '), // true, as expected
)

// more readable workaround
const hasEmoji = str => {
  const nbEmojiOrNumber = (str.match(/\p{Emoji}/gu) || []).length;
  const nbNumber = (str.match(/\p{Number}/gu) || []).length;
  return nbEmojiOrNumber > nbNumber;
}
console.log(
  hasEmoji('flowers'), // false, as expected
  hasEmoji('flowers 123'), // false, as expected
  hasEmoji('flowers 123 '), // true, as expected
  hasEmoji('flowers '), // true, as expected
)
Run Code Online (Sandbox Code Playgroud)

Wik*_*żew 7

根据这篇文章, digtis, #, *, ZWJ 和更多字符包含Emoji设置为Yes的属性,这意味着数字被认为是有效的表情符号字符

0023          ; Emoji_Component      #  1.1  [1] (#?)       number sign
002A          ; Emoji_Component      #  1.1  [1] (*?)       asterisk
0030..0039    ; Emoji_Component      #  1.1 [10] (0?..9?)    digit zero..digit nine
200D          ; Emoji_Component      #  1.1  [1] (?)        zero width joiner
20E3          ; Emoji_Component      #  3.0  [1] (?)       combining enclosing keycap
FE0F          ; Emoji_Component      #  3.2  [1] ()        VARIATION SELECTOR-16
1F1E6..1F1FF  ; Emoji_Component      #  6.0 [26] (..)    regional indicator symbol letter a..regional indicator symbol letter z
1F3FB..1F3FF  ; Emoji_Component      #  8.0  [5] (..)    light skin tone..dark skin tone
1F9B0..1F9B3  ; Emoji_Component      # 11.0  [4] (..)    red-haired..white-haired
E0020..E007F  ; Emoji_Component      #  3.1 [96] (..)      tag space..cancel tag
Run Code Online (Sandbox Code Playgroud)

例如,1是一个数字,但与U+FE0FU+20E3chars: 1??组合时它变成了一个表情符号:

0023          ; Emoji_Component      #  1.1  [1] (#?)       number sign
002A          ; Emoji_Component      #  1.1  [1] (*?)       asterisk
0030..0039    ; Emoji_Component      #  1.1 [10] (0?..9?)    digit zero..digit nine
200D          ; Emoji_Component      #  1.1  [1] (?)        zero width joiner
20E3          ; Emoji_Component      #  3.0  [1] (?)       combining enclosing keycap
FE0F          ; Emoji_Component      #  3.2  [1] ()        VARIATION SELECTOR-16
1F1E6..1F1FF  ; Emoji_Component      #  6.0 [26] (..)    regional indicator symbol letter a..regional indicator symbol letter z
1F3FB..1F3FF  ; Emoji_Component      #  8.0  [5] (..)    light skin tone..dark skin tone
1F9B0..1F9B3  ; Emoji_Component      # 11.0  [4] (..)    red-haired..white-haired
E0020..E007F  ; Emoji_Component      #  3.1 [96] (..)      tag space..cancel tag
Run Code Online (Sandbox Code Playgroud)

如果要避免匹配数字,请使用Extended_PictographicUnicode 类别类:

Extended_Pictographic 字符包含除某些 Emoji_Components 之外的所有 Emoji 字符。

因此,您可以使用/\p{Extended_Pictographic}/gu大多数表情符号,或/\p{Extended_Pictographic}/u测试单个表情符号,或用于/[\p{Extended_Pictographic}\u{1F3FB}-\u{1F3FF}\u{1F9B0}-\u{1F9B3}]/u将表情符号适当且浅色皮肤与深色皮肤模式字符以及红发字符与白发字符相匹配:

console.log("1\uFE0F\u20E3 2\uFE0F\u20E3 3\uFE0F\u20E3 4\uFE0F\u20E3 5\uFE0F\u20E3 6\uFE0F\u20E3 7\uFE0F\u20E3 8\uFE0F\u20E3 9\uFE0F\u20E3 0\uFE0F\u20E3")
Run Code Online (Sandbox Code Playgroud)