在Python 3中删除BMP(表情符号)之外的字符

pac*_*hvo 2 python python-3.x

我有一个错误: UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 266-266: Non-BMP character not supported in Tk

我正在解析数据,然后将一些表情符号分解为数组。data = 'this variable contains some emoji's?'我想要:data = 'this variable contains some emoji's'

如何从数据中删除这些字符或在Python 3中处理这种情况?

Sha*_*ger 6

如果目标只是删除上面的所有字符'\uFFFF',那么直接的方法就是这样做:

data = "this variable contains some emoji's?"
data = ''.join(c for c in data if c <= '\uFFFF')
Run Code Online (Sandbox Code Playgroud)

您的字符串可能是分解形式的,因此您可能需要首先normalize其组成形式,以便可以识别非BMP字符:

import unicodedata

data = ''.join(c for c in unicodedata.normalize('NFC', data) if c <= '\uFFFF')
Run Code Online (Sandbox Code Playgroud)