仍然没有简单的方法将复合表情符号的字符串拆分到数组中吗?

jos*_*t21 4 javascript arrays string emoji

在 JavaScript 中将字符串拆分为数组有很多好的(和坏的)方法。

\n

例如,在 ES6 中,仅使用扩展运算符:

\n
let str = "example 1";\nlet arr = [...str];\nconsole.log(arr); \n// [\'e\', \'x\', \'a\', \'m\', \'p\', \'l\', \'e\', \' \', \'1\']\n
Run Code Online (Sandbox Code Playgroud)\n

当包含表情符号时,其中很多仍然有效

\n
let str = "example 2 ";\nlet arr = [...str];\nconsole.log(arr); \n//  [\'e\', \'x\', \'a\', \'m\', \'p\', \'l\', \'e\', \' \', \'2\', \' \', \'\']\n
Run Code Online (Sandbox Code Playgroud)\n

但是,如果字符串中存在复合表情符号,到目前为止我发现的所有解决方案(例如12 )都会失败

\n
let str = "example 3 \xe2\x80\x8d\xe2\x99\x80\xef\xb8\x8f \xe2\x80\x8d\xe2\x80\x8d\xe2\x80\x8d";\nlet arr = [...str];\nconsole.log(arr); \n// [\'e\', \'x\', \'a\', \'m\', \'p\', \'l\', \'e\', \' \', \'3\', \' \', \'\', \'\xe2\x80\x8d\', \'\xe2\x99\x80\', \'\xef\xb8\x8f\', \' \', \'\', \'\xe2\x80\x8d\', \'\', \'\xe2\x80\x8d\', \'\', \'\xe2\x80\x8d\', \'\']\n
Run Code Online (Sandbox Code Playgroud)\n

grapheme-splitter,但这已经很多年没有更新了。另外,我更愿意找到一种不需要包含外部库/包的方法。

\n

方法可以正确检测(许多,不是全部)复合表情符号,但不会分离文本部分。(我还不明白它是如何工作的,所以我可以调整它。)

\n
let str = "example 3 \xe2\x80\x8d\xe2\x99\x80\xef\xb8\x8f \xe2\x80\x8d\xe2\x80\x8d\xe2\x80\x8d";\n  let regex = /([\\uD800-\\uDBFF][\\uDC00-\\uDFFF](?:[\\u200D\\uFE0F][\\uD800-\\uDBFF][\\uDC00-\\uDFFF]){2,}|\\uD83D\\uDC69(?:\\u200D(?:(?:\\uD83D\\uDC69\\u200D)?\\uD83D\\uDC67|(?:\\uD83D\\uDC69\\u200D)?\\uD83D\\uDC66)|\\uD83C[\\uDFFB-\\uDFFF])|\\uD83D\\uDC69\\u200D(?:\\uD83D\\uDC69\\u200D)?\\uD83D\\uDC66\\u200D\\uD83D\\uDC66|\\uD83D\\uDC69\\u200D(?:\\uD83D\\uDC69\\u200D)?\\uD83D\\uDC67\\u200D(?:\\uD83D[\\uDC66\\uDC67])|\\uD83C\\uDFF3\\uFE0F\\u200D\\uD83C\\uDF08|(?:\\uD83C[\\uDFC3\\uDFC4\\uDFCA]|\\uD83D[\\uDC6E\\uDC71\\uDC73\\uDC77\\uDC81\\uDC82\\uDC86\\uDC87\\uDE45-\\uDE47\\uDE4B\\uDE4D\\uDE4E\\uDEA3\\uDEB4-\\uDEB6]|\\uD83E[\\uDD26\\uDD37-\\uDD39\\uDD3D\\uDD3E\\uDDD6-\\uDDDD])(?:\\uD83C[\\uDFFB-\\uDFFF])\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83D\\uDC69(?:\\uD83C[\\uDFFB-\\uDFFF])\\u200D(?:\\uD83C[\\uDF3E\\uDF73\\uDF93\\uDFA4\\uDFA8\\uDFEB\\uDFED]|\\uD83D[\\uDCBB\\uDCBC\\uDD27\\uDD2C\\uDE80\\uDE92])|(?:\\uD83C[\\uDFC3\\uDFC4\\uDFCA]|\\uD83D[\\uDC6E\\uDC6F\\uDC71\\uDC73\\uDC77\\uDC81\\uDC82\\uDC86\\uDC87\\uDE45-\\uDE47\\uDE4B\\uDE4D\\uDE4E\\uDEA3\\uDEB4-\\uDEB6]|\\uD83E[\\uDD26\\uDD37-\\uDD39\\uDD3C-\\uDD3E\\uDDD6-\\uDDDF])\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C\\uDDFD\\uD83C\\uDDF0|\\uD83C\\uDDF6\\uD83C\\uDDE6|\\uD83C\\uDDF4\\uD83C\\uDDF2|\\uD83C\\uDDE9(?:\\uD83C[\\uDDEA\\uDDEC\\uDDEF\\uDDF0\\uDDF2\\uDDF4\\uDDFF])|\\uD83C\\uDDF7(?:\\uD83C[\\uDDEA\\uDDF4\\uDDF8\\uDDFA\\uDDFC])|\\uD83C\\uDDE8(?:\\uD83C[\\uDDE6\\uDDE8\\uDDE9\\uDDEB-\\uDDEE\\uDDF0-\\uDDF5\\uDDF7\\uDDFA-\\uDDFF])|(?:\\u26F9|\\uD83C[\\uDFCB\\uDFCC]|\\uD83D\\uDD75)(?:\\uFE0F\\u200D[\\u2640\\u2642]|(?:\\uD83C[\\uDFFB-\\uDFFF])\\u200D[\\u2640\\u2642])\\uFE0F|(?:\\uD83D\\uDC41\\uFE0F\\u200D\\uD83D\\uDDE8|\\uD83D\\uDC69(?:\\uD83C[\\uDFFB-\\uDFFF])\\u200D[\\u2695\\u2696\\u2708]|\\uD83D\\uDC69\\u200D[\\u2695\\u2696\\u2708]|\\uD83D\\uDC68(?:(?:\\uD83C[\\uDFFB-\\uDFFF])\\u200D[\\u2695\\u2696\\u2708]|\\u200D[\\u2695\\u2696\\u2708]))\\uFE0F|\\uD83C\\uDDF2(?:\\uD83C[\\uDDE6\\uDDE8-\\uDDED\\uDDF0-\\uDDFF])|\\uD83D\\uDC69\\u200D(?:\\uD83C[\\uDF3E\\uDF73\\uDF93\\uDFA4\\uDFA8\\uDFEB\\uDFED]|\\uD83D[\\uDCBB\\uDCBC\\uDD27\\uDD2C\\uDE80\\uDE92]|\\u2764\\uFE0F\\u200D(?:\\uD83D\\uDC8B\\u200D(?:\\uD83D[\\uDC68\\uDC69])|\\uD83D[\\uDC68\\uDC69]))|\\uD83C\\uDDF1(?:\\uD83C[\\uDDE6-\\uDDE8\\uDDEE\\uDDF0\\uDDF7-\\uDDFB\\uDDFE])|\\uD83C\\uDDEF(?:\\uD83C[\\uDDEA\\uDDF2\\uDDF4\\uDDF5])|\\uD83C\\uDDED(?:\\uD83C[\\uDDF0\\uDDF2\\uDDF3\\uDDF7\\uDDF9\\uDDFA])|\\uD83C\\uDDEB(?:\\uD83C[\\uDDEE-\\uDDF0\\uDDF2\\uDDF4\\uDDF7])|[#\\*0-9]\\uFE0F\\u20E3|\\uD83C\\uDDE7(?:\\uD83C[\\uDDE6\\uDDE7\\uDDE9-\\uDDEF\\uDDF1-\\uDDF4\\uDDF6-\\uDDF9\\uDDFB\\uDDFC\\uDDFE\\uDDFF])|\\uD83C\\uDDE6(?:\\uD83C[\\uDDE8-\\uDDEC\\uDDEE\\uDDF1\\uDDF2\\uDDF4\\uDDF6-\\uDDFA\\uDDFC\\uDDFD\\uDDFF])|\\uD83C\\uDDFF(?:\\uD83C[\\uDDE6\\uDDF2\\uDDFC])|\\uD83C\\uDDF5(?:\\uD83C[\\uDDE6\\uDDEA-\\uDDED\\uDDF0-\\uDDF3\\uDDF7-\\uDDF9\\uDDFC\\uDDFE])|\\uD83C\\uDDFB(?:\\uD83C[\\uDDE6\\uDDE8\\uDDEA\\uDDEC\\uDDEE\\uDDF3\\uDDFA])|\\uD83C\\uDDF3(?:\\uD83C[\\uDDE6\\uDDE8\\uDDEA-\\uDDEC\\uDDEE\\uDDF1\\uDDF4\\uDDF5\\uDDF7\\uDDFA\\uDDFF])|\\uD83C\\uDFF4\\uDB40\\uDC67\\uDB40\\uDC62(?:\\uDB40\\uDC77\\uDB40\\uDC6C\\uDB40\\uDC73|\\uDB40\\uDC73\\uDB40\\uDC63\\uDB40\\uDC74|\\uDB40\\uDC65\\uDB40\\uDC6E\\uDB40\\uDC67)\\uDB40\\uDC7F|\\uD83D\\uDC68(?:\\u200D(?:\\u2764\\uFE0F\\u200D(?:\\uD83D\\uDC8B\\u200D)?\\uD83D\\uDC68|(?:(?:\\uD83D[\\uDC68\\uDC69])\\u200D)?\\uD83D\\uDC66\\u200D\\uD83D\\uDC66|(?:(?:\\uD83D[\\uDC68\\uDC69])\\u200D)?\\uD83D\\uDC67\\u200D(?:\\uD83D[\\uDC66\\uDC67])|\\uD83C[\\uDF3E\\uDF73\\uDF93\\uDFA4\\uDFA8\\uDFEB\\uDFED]|\\uD83D[\\uDCBB\\uDCBC\\uDD27\\uDD2C\\uDE80\\uDE92])|(?:\\uD83C[\\uDFFB-\\uDFFF])\\u200D(?:\\uD83C[\\uDF3E\\uDF73\\uDF93\\uDFA4\\uDFA8\\uDFEB\\uDFED]|\\uD83D[\\uDCBB\\uDCBC\\uDD27\\uDD2C\\uDE80\\uDE92]))|\\uD83C\\uDDF8(?:\\uD83C[\\uDDE6-\\uDDEA\\uDDEC-\\uDDF4\\uDDF7-\\uDDF9\\uDDFB\\uDDFD-\\uDDFF])|\\uD83C\\uDDF0(?:\\uD83C[\\uDDEA\\uDDEC-\\uDDEE\\uDDF2\\uDDF3\\uDDF5\\uDDF7\\uDDFC\\uDDFE\\uDDFF])|\\uD83C\\uDDFE(?:\\uD83C[\\uDDEA\\uDDF9])|\\uD83C\\uDDEE(?:\\uD83C[\\uDDE8-\\uDDEA\\uDDF1-\\uDDF4\\uDDF6-\\uDDF9])|\\uD83C\\uDDF9(?:\\uD83C[\\uDDE6\\uDDE8\\uDDE9\\uDDEB-\\uDDED\\uDDEF-\\uDDF4\\uDDF7\\uDDF9\\uDDFB\\uDDFC\\uDDFF])|\\uD83C\\uDDEC(?:\\uD83C[\\uDDE6\\uDDE7\\uDDE9-\\uDDEE\\uDDF1-\\uDDF3\\uDDF5-\\uDDFA\\uDDFC\\uDDFE])|\\uD83C\\uDDFA(?:\\uD83C[\\uDDE6\\uDDEC\\uDDF2\\uDDF3\\uDDF8\\uDDFE\\uDDFF])|\\uD83C\\uDDEA(?:\\uD83C[\\uDDE6\\uDDE8\\uDDEA\\uDDEC\\uDDED\\uDDF7-\\uDDFA])|\\uD83C\\uDDFC(?:\\uD83C[\\uDDEB\\uDDF8])|(?:\\u26F9|\\uD83C[\\uDFCB\\uDFCC]|\\uD83D\\uDD75)(?:\\uD83C[\\uDFFB-\\uDFFF])|(?:\\uD83C[\\uDFC3\\uDFC4\\uDFCA]|\\uD83D[\\uDC6E\\uDC71\\uDC73\\uDC77\\uDC81\\uDC82\\uDC86\\uDC87\\uDE45-\\uDE47\\uDE4B\\uDE4D\\uDE4E\\uDEA3\\uDEB4-\\uDEB6]|\\uD83E[\\uDD26\\uDD37-\\uDD39\\uDD3D\\uDD3E\\uDDD6-\\uDDDD])(?:\\uD83C[\\uDFFB-\\uDFFF])|(?:[\\u261D\\u270A-\\u270D]|\\uD83C[\\uDF85\\uDFC2\\uDFC7]|\\uD83D[\\uDC42\\uDC43\\uDC46-\\uDC50\\uDC66\\uDC67\\uDC70\\uDC72\\uDC74-\\uDC76\\uDC78\\uDC7C\\uDC83\\uDC85\\uDCAA\\uDD74\\uDD7A\\uDD90\\uDD95\\uDD96\\uDE4C\\uDE4F\\uDEC0\\uDECC]|\\uD83E[\\uDD18-\\uDD1C\\uDD1E\\uDD1F\\uDD30-\\uDD36\\uDDD1-\\uDDD5])(?:\\uD83C[\\uDFFB-\\uDFFF])|\\uD83D\\uDC68(?:\\u200D(?:(?:(?:\\uD83D[\\uDC68\\uDC69])\\u200D)?\\uD83D\\uDC67|(?:(?:\\uD83D[\\uDC68\\uDC69])\\u200D)?\\uD83D\\uDC66)|\\uD83C[\\uDFFB-\\uDFFF])|(?:[\\u261D\\u26F9\\u270A-\\u270D]|\\uD83C[\\uDF85\\uDFC2-\\uDFC4\\uDFC7\\uDFCA-\\uDFCC]|\\uD83D[\\uDC42\\uDC43\\uDC46-\\uDC50\\uDC66-\\uDC69\\uDC6E\\uDC70-\\uDC78\\uDC7C\\uDC81-\\uDC83\\uDC85-\\uDC87\\uDCAA\\uDD74\\uDD75\\uDD7A\\uDD90\\uDD95\\uDD96\\uDE45-\\uDE47\\uDE4B-\\uDE4F\\uDEA3\\uDEB4-\\uDEB6\\uDEC0\\uDECC]|\\uD83E[\\uDD18-\\uDD1C\\uDD1E\\uDD1F\\uDD26\\uDD30-\\uDD39\\uDD3D\\uDD3E\\uDDD1-\\uDDDD])(?:\\uD83C[\\uDFFB-\\uDFFF])?|(?:[\\u231A\\u231B\\u23E9-\\u23EC\\u23F0\\u23F3\\u25FD\\u25FE\\u2614\\u2615\\u2648-\\u2653\\u267F\\u2693\\u26A1\\u26AA\\u26AB\\u26BD\\u26BE\\u26C4\\u26C5\\u26CE\\u26D4\\u26EA\\u26F2\\u26F3\\u26F5\\u26FA\\u26FD\\u2705\\u270A\\u270B\\u2728\\u274C\\u274E\\u2753-\\u2755\\u2757\\u2795-\\u2797\\u27B0\\u27BF\\u2B1B\\u2B1C\\u2B50\\u2B55]|\\uD83C[\\uDC04\\uDCCF\\uDD8E\\uDD91-\\uDD9A\\uDDE6-\\uDDFF\\uDE01\\uDE1A\\uDE2F\\uDE32-\\uDE36\\uDE38-\\uDE3A\\uDE50\\uDE51\\uDF00-\\uDF20\\uDF2D-\\uDF35\\uDF37-\\uDF7C\\uDF7E-\\uDF93\\uDFA0-\\uDFCA\\uDFCF-\\uDFD3\\uDFE0-\\uDFF0\\uDFF4\\uDFF8-\\uDFFF]|\\uD83D[\\uDC00-\\uDC3E\\uDC40\\uDC42-\\uDCFC\\uDCFF-\\uDD3D\\uDD4B-\\uDD4E\\uDD50-\\uDD67\\uDD7A\\uDD95\\uDD96\\uDDA4\\uDDFB-\\uDE4F\\uDE80-\\uDEC5\\uDECC\\uDED0-\\uDED2\\uDEEB\\uDEEC\\uDEF4-\\uDEF8]|\\uD83E[\\uDD10-\\uDD3A\\uDD3C-\\uDD3E\\uDD40-\\uDD45\\uDD47-\\uDD4C\\uDD50-\\uDD6B\\uDD80-\\uDD97\\uDDC0\\uDDD0-\\uDDE6])|(?:[#\\*0-9\\xA9\\xAE\\u203C\\u2049\\u2122\\u2139\\u2194-\\u2199\\u21A9\\u21AA\\u231A\\u231B\\u2328\\u23CF\\u23E9-\\u23F3\\u23F8-\\u23FA\\u24C2\\u25AA\\u25AB\\u25B6\\u25C0\\u25FB-\\u25FE\\u2600-\\u2604\\u260E\\u2611\\u2614\\u2615\\u2618\\u261D\\u2620\\u2622\\u2623\\u2626\\u262A\\u262E\\u262F\\u2638-\\u263A\\u2640\\u2642\\u2648-\\u2653\\u2660\\u2663\\u2665\\u2666\\u2668\\u267B\\u267F\\u2692-\\u2697\\u2699\\u269B\\u269C\\u26A0\\u26A1\\u26AA\\u26AB\\u26B0\\u26B1\\u26BD\\u26BE\\u26C4\\u26C5\\u26C8\\u26CE\\u26CF\\u26D1\\u26D3\\u26D4\\u26E9\\u26EA\\u26F0-\\u26F5\\u26F7-\\u26FA\\u26FD\\u2702\\u2705\\u2708-\\u270D\\u270F\\u2712\\u2714\\u2716\\u271D\\u2721\\u2728\\u2733\\u2734\\u2744\\u2747\\u274C\\u274E\\u2753-\\u2755\\u2757\\u2763\\u2764\\u2795-\\u2797\\u27A1\\u27B0\\u27BF\\u2934\\u2935\\u2B05-\\u2B07\\u2B1B\\u2B1C\\u2B50\\u2B55\\u3030\\u303D\\u3297\\u3299]|\\uD83C[\\uDC04\\uDCCF\\uDD70\\uDD71\\uDD7E\\uDD7F\\uDD8E\\uDD91-\\uDD9A\\uDDE6-\\uDDFF\\uDE01\\uDE02\\uDE1A\\uDE2F\\uDE32-\\uDE3A\\uDE50\\uDE51\\uDF00-\\uDF21\\uDF24-\\uDF93\\uDF96\\uDF97\\uDF99-\\uDF9B\\uDF9E-\\uDFF0\\uDFF3-\\uDFF5\\uDFF7-\\uDFFF]|\\uD83D[\\uDC00-\\uDCFD\\uDCFF-\\uDD3D\\uDD49-\\uDD4E\\uDD50-\\uDD67\\uDD6F\\uDD70\\uDD73-\\uDD7A\\uDD87\\uDD8A-\\uDD8D\\uDD90\\uDD95\\uDD96\\uDDA4\\uDDA5\\uDDA8\\uDDB1\\uDDB2\\uDDBC\\uDDC2-\\uDDC4\\uDDD1-\\uDDD3\\uDDDC-\\uDDDE\\uDDE1\\uDDE3\\uDDE8\\uDDEF\\uDDF3\\uDDFA-\\uDE4F\\uDE80-\\uDEC5\\uDECB-\\uDED2\\uDEE0-\\uDEE5\\uDEE9\\uDEEB\\uDEEC\\uDEF0\\uDEF3-\\uDEF8]|\\uD83E[\\uDD10-\\uDD3A\\uDD3C-\\uDD3E\\uDD40-\\uDD45\\uDD47-\\uDD4C\\uDD50-\\uDD6B\\uDD80-\\uDD97\\uDDC0\\uDDD0-\\uDDE6])\\uFE0F)/;\n  let arr = str.split(regex).filter(Boolean);\n  console.log(arr); // [\'example 3 \', \'\xe2\x80\x8d\xe2\x99\x80\xef\xb8\x8f\', \' \', \'\xe2\x80\x8d\xe2\x80\x8d\xe2\x80\x8d\']\n
Run Code Online (Sandbox Code Playgroud)\n

恕我直言,它非常“丑陋”。也许使用Emoji Properties可以让它变得更漂亮?

\n

由于表情符号如今无处不在,我觉得必须有一种简单的方法将具有复合表情符号的字符串拆分为数组。

\n

tri*_*cot 5

由于 Unicode 变得越来越复杂,因此没有简短的解决方案代码。您可以查看unicode.org本身提供的EBNF 和 Regex。在撰写本文时,内容如下:

\n
\n

正则表达式

\n
\\p{RI} \\p{RI} \n| \\p{Emoji} \n  ( \\p{EMod} \n  | \\x{FE0F} \\x{20E3}? \n  | [\\x{E0020}-\\x{E007E}]+ \\x{E007F} )?\n  (\\x{200D} \\p{Emoji}\n    ( \\p{EMod} \n    | \\x{FE0F} \\x{20E3}? \n    | [\\x{E0020}-\\x{E007E}]+ \\x{E007F} )?\n  )*\n
Run Code Online (Sandbox Code Playgroud)\n
\n

正则表达式翻译成 JavaScript 是:

\n
/\\p{RI}\\p{RI}|\\p{Emoji}(\\p{EMod}|\\u{FE0F}\\u{20E3}?|[\\u{E0020}-\\u{E007E}]+\\u{E007F})?(\\u{200D}\\p{Emoji}(\\p{EMod}|\\u{FE0F}\\u{20E3}?|[\\u{E0020}-\\u{E007E}]+\\u{E007F})?)*/gu\n
Run Code Online (Sandbox Code Playgroud)\n

如果您还想匹配任何其他字符,则只需附加|.到正则表达式,并添加修饰符,s以便点通配符也匹配新行字符。

\n

以您的输入示例:

\n

\r\n
\r\n
let regex = /\\p{RI}\\p{RI}|\\p{Emoji}(\\p{EMod}|\\u{FE0F}\\u{20E3}?|[\\u{E0020}-\\u{E007E}]+\\u{E007F})?(\\u{200D}\\p{Emoji}(\\p{EMod}|\\u{FE0F}\\u{20E3}?|[\\u{E0020}-\\u{E007E}]+\\u{E007F})?)*|./gus;\n\nlet str = "example 3 \xe2\x80\x8d\xe2\x99\x80\xef\xb8\x8f \xe2\x80\x8d\xe2\x80\x8d\xe2\x80\x8d";\n\ndocument.write(\'<pre>\'+str.match(regex).join(\'\\n\'));
Run Code Online (Sandbox Code Playgroud)\r\n
\r\n
\r\n

\n

注意:我不得不使用document.write而不是console.log这里,因为consoleStack Snippets 提供的实现无法胜任任务。

\n

现在,这个正则表达式受到了一些批评,从而导致了许多不同的风格。例如,在这个答案中,提出了以下调整和扩展来处理一些问题:

\n
/\\p{RI}\\p{RI}|\\p{Emoji}(\\p{EMod}+|\\u{FE0F}\\u{20E3}?|[\\u{E0020}-\\u{E007E}]+\\u{E007F})?(\\u{200D}\\p{Emoji}(\\p{EMod}+|\\u{FE0F}\\u{20E3}?|[\\u{E0020}-\\u{E007E}]+\\u{E007F})?)+|\\p{EPres}(\\p{EMod}+|\\u{FE0F}\\u{20E3}?|[\\u{E0020}-\\u{E007E}]+\\u{E007F})?|\\p{Emoji}(\\p{EMod}+|\\u{FE0F}\\u{20E3}?|[\\u{E0020}-\\u{E007E}]+\\u{E007F})/gu\n
Run Code Online (Sandbox Code Playgroud)\n

同样,您只需附加|.到正则表达式(并添加s修饰符)即可使其也匹配普通字符。

\n