使用python的split函数后奇怪的解码（例如：\x00）

Question

使用python的split函数后奇怪的解码（例如：\x00）

sid*_*gar 1 python encoding decoding python-3.x

这是一个非常奇怪的情况， split 函数正在改变字符串格式。请看下面的代码，

代码：

COM_Port = serial.Serial(COM_PortName)
with COM_Port as port:
    while True:
         RxedData = port.readline()
         line = RxedData.decode('utf-8')
         print("Line 1: ", line)
         row = line.split(',')[1:-1]
         print("Line 2: ", row)

Run Code Online (Sandbox Code Playgroud)

输出：

Line 1: "* , 0 0 0 0 0 5 7 5 , 2 3 : 0 3 : 4 7 , 1 1 / 0 2 / 2 0 , 1 2 . 3 4 5 , K P A , 0 0 0 0 6 . 8 3 , S L P M , T B ,                 , $ "

Line 2: ['\x000\x000\x000\x000\x000\x006\x002\x001\x00', '\x002\x000\x00:\x004\x006\x00:\x005\x001\x00', '\x001\x002\x00/\x000\x002\x00/\x002\x000\x00', '\x001\x002\x00.\x003\x004\x005\x00', '\x00K\x00P\x00A\x00', '\x000\x000\x000\x000\x000\x00.\x000\x000\x00', '\x00C\x00C\x00P\x00M\x00', '\x00T\x00G\x00', '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']

Run Code Online (Sandbox Code Playgroud)

怎么Line 2，进去\x000\x000...？这个编码格式是什么？如何将其转换为正确的格式？

编辑1：

print([hex(i) for i in RxedData])

Run Code Online (Sandbox Code Playgroud)

输出：

['0x2a', '0x0', '0x2c', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x31', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x31', '0x0', '0x3a', '0x0', '0x35', '0x0', '0x31', '0x0', '0x3a', '0x0', '0x35', '0x0', '0x30', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x33', '0x0', '0x2f', '0x0', '0x30', '0x0', '0x32', '0x0', '0x2f', '0x0', '0x32', '0x0', '0x30', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x32', '0x0', '0x2e', '0x0', '0x33', '0x0', '0x34', '0x0', '0x35', '0x0', '0x2c', '0x0', '0x4b', '0x0', '0x50', '0x0', '0x41', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x32', '0x0', '0x33', '0x0', '0x34', '0x0', '0x35', '0x0', '0x2e', '0x0', '0x36', '0x0', '0x36', '0x0', '0x2c', '0x0', '0x53', '0x0', '0x4c', '0x0', '0x50', '0x0', '0x48', '0x0', '0x2c', '0x0', '0x0', '0x0', '0x0', '0x0', '0x2c', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2c', '0x0', '0x24', '0x0', '0xa']

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ser*_*sta 5

好的，从接收到的字节的十六进制转储来看，每个ASCII字符后面都跟着一个 NULL 字节 ( \\x00)。这只是字符的 UTF-16-LE 表示形式。UTF-8 解码仅保留初始字节的代码点，因为所有字节都低于 128，留下所有交错空值。而且您不能简单地将字节字符串解码为 UTF-16（它确实是什么），因为您是通过 a 获得的，readline它在换行符之后停止，并且尚未读取接下来的空字符。

\n\n

如果您可以读取另一行，它可能会以空字符开头，使该行显示为 UTF-16-BE 编码...

\n\n

那能做什么呢？

\n\n

一个简单的解决方法就是删除空字符。如果你可以确定你只会得到纯 ASCII 字符（没有重音符号，比如\xc3\xa9、没有表情符号、没有希腊语或西里尔语等），这就足够了：

\n\n

     RxedData = port.readline()\n     line = RxedData.replace(b\'\\x00\', b\'\').decode(\'ascii\')\n     print("Line 1: ", line)\n     row = line.split(\',\')[1:-1]\n     print("Line 2: ", row)\n

Run Code Online (Sandbox Code Playgroud)\n\n

使用该值：[\'0x2a\', \'0x0\', \'0x2c\', \'0x0\', \'0x30\', \'0x0\', \'0x30\', \'0x0\', \'0x30\', \'0x0\', \'0x30\', \'0x0\', \'0x30\', \'0x0\', \'0x30\', \'0x0\', \'0x30\', \'0x0\', \'0x31\', \'0x0\', \'0x2c\', \'0x0\', \'0x31\', \'0x0\', \'0x31\', \'0x0\', \'0x3a\', \'0x0\', \'0x35\', \'0x0\', \'0x31\', \'0x0\', \'0x3a\', \'0x0\', \'0x35\', \'0x0\', \'0x30\', \'0x0\', \'0x2c\', \'0x0\', \'0x31\', \'0x0\', \'0x33\', \'0x0\', \'0x2f\', \'0x0\', \'0x30\', \'0x0\', \'0x32\', \'0x0\', \'0x2f\', \'0x0\', \'0x32\', \'0x0\', \'0x30\', \'0x0\', \'0x2c\', \'0x0\', \'0x31\', \'0x0\', \'0x32\', \'0x0\', \'0x2e\', \'0x0\', \'0x33\', \'0x0\', \'0x34\', \'0x0\', \'0x35\', \'0x0\', \'0x2c\', \'0x0\', \'0x4b\', \'0x0\', \'0x50\', \'0x0\', \'0x41\', \'0x0\', \'0x2c\', \'0x0\', \'0x31\', \'0x0\', \'0x32\', \'0x0\', \'0x33\', \'0x0\', \'0x34\', \'0x0\', \'0x35\', \'0x0\', \'0x2e\', \'0x0\', \'0x36\', \'0x0\', \'0x36\', \'0x0\', \'0x2c\', \'0x0\', \'0x53\', \'0x0\', \'0x4c\', \'0x0\', \'0x50\', \'0x0\', \'0x48\', \'0x0\', \'0x2c\', \'0x0\', \'0x0\', \'0x0\', \'0x0\', \'0x0\', \'0x2c\', \'0x0\', \'0x2d\', \'0x0\', \'0x2d\', \'0x0\', \'0x2d\', \'0x0\', \'0x2d\', \'0x0\', \'0x2d\', \'0x0\', \'0x2d\', \'0x0\', \'0x2d\', \'0x0\', \'0x2d\', \'0x0\', \'0x2c\', \'0x0\', \'0x24\', \'0x0\', \'0xa\']，您应该获得：

\n\n

     RxedData = port.readline()\n     line = RxedData.replace(b\'\\x00\', b\'\').decode(\'ascii\')\n     print("Line 1: ", line)\n     row = line.split(\',\')[1:-1]\n     print("Line 2: ", row)\n

Run Code Online (Sandbox Code Playgroud)\n\n

它的优点是简单且强大，只要您只有纯 ASCII

\n\n

编码一致等待将在串行端口周围使用 TextIOWrapper，并在其中指定 UTF-16-LE 编码。我无法测试它（我的盒子上没有序列号，不需要它），所以只能猜测应该做什么。

\n\n

COM_Port = serial.Serial(COM_PortName)\nwith io.TextIOWrapper(io.BufferedRWPair(COM_Port, COM_Port), encoding = \'utf-16-le\') as port:\n    while True:\n         line = port.readline()\n         print("Line 1: ", line)\n         row = line.split(\',\')[1:-1]\n         print("Line 2: ", row)\n

Run Code Online (Sandbox Code Playgroud)\n\n

在这里，TextIOWrapper 将处理换行字节后面的空字节，并直接为您提供真正的 unicode 字符串。

\n

归档时间：	5 年，8 月前
查看次数：	9470 次
最近记录：	5 年，8 月前