获取包含unicode字符超过0xffff的字符串长度

Question

获取包含unicode字符超过0xffff的字符串长度

I\xe2\x80\x99m 使用这个字符，双锐号\'\'，unicode 是 0x1d12a。
\n如果我在字符串中使用它，我可以\xe2\x80\x99t 获得正确的字符串长度：

\n\n

str = "F"\nstr.length // returns 3, even though there are 2 characters!\n

Run Code Online (Sandbox Code Playgroud)\n\n

如何让函数返回正确的答案，无论 I\xe2\x80\x99m 是否使用特殊的 unicode \xe2\x80\xaf？

\n

Answer 1

Ade*_*lin 1

总结一下我的评论：

\n\n

这就是该字符串的长度。

\n\n

有些字符也涉及其他字符，即使它看起来像单个字符。"\xcc\x89m\xe1\xbb\xa7t\xcc\x89\xe1\xba\xa3\xcc\x89\xcc\x89\xcc\x89t\xcc\x89\xe1\xba\xbbd\xcc\x89W\xcc\x89\xe1\xbb\x8f\xcc\x89r\xcc\x89\xcc\x89d\xcc\x89\xcc\x89".length == 24

\n\n

来自这篇（很棒的）博客文章中，他们有一个可以返回正确长度的函数：

\n\n

\r\n

function fancyCount(str){\r\n  const joiner = "\\u{200D}";\r\n  const split = str.split(joiner);\r\n  let count = 0;\r\n    \r\n  for(const s of split){\r\n    //removing the variation selectors\r\n    const num = Array.from(s.split(/[\\ufe00-\\ufe0f]/).join("")).length;\r\n    count += num;\r\n  }\r\n    \r\n  //assuming the joiners are used appropriately\r\n  return count / split.length;\r\n}\r\n\r\nconsole.log(fancyCount("F") == 2) // true

Run Code Online (Sandbox Code Playgroud)\r\n

\r\n

\n

你的代码太多了。`console.log([..."F"].length); // 2` (4认同)
`for (让 i = 0; i < 0x110000; i++) {让 c = String.fromCodePoint(i); console.log([...c].length, c);}` (2认同)

归档时间：	7 年，7 月前
查看次数：	3381 次
最近记录：	5 年，2 月前