使用python的split函数后奇怪的解码(例如:\x00)

sid*_*gar 1 python encoding decoding python-3.x

这是一个非常奇怪的情况, split 函数正在改变字符串格式。请看下面的代码,

代码:

COM_Port = serial.Serial(COM_PortName)
with COM_Port as port:
    while True:
         RxedData = port.readline()
         line = RxedData.decode('utf-8')
         print("Line 1: ", line)
         row = line.split(',')[1:-1]
         print("Line 2: ", row)
Run Code Online (Sandbox Code Playgroud)

输出:

Line 1: "* , 0 0 0 0 0 5 7 5 , 2 3 : 0 3 : 4 7 , 1 1 / 0 2 / 2 0 , 1 2 . 3 4 5 , K P A , 0 0 0 0 6 . 8 3 , S L P M , T B ,                 , $ "

Line 2: ['\x000\x000\x000\x000\x000\x006\x002\x001\x00', '\x002\x000\x00:\x004\x006\x00:\x005\x001\x00', '\x001\x002\x00/\x000\x002\x00/\x002\x000\x00', '\x001\x002\x00.\x003\x004\x005\x00', '\x00K\x00P\x00A\x00', '\x000\x000\x000\x000\x000\x00.\x000\x000\x00', '\x00C\x00C\x00P\x00M\x00', '\x00T\x00G\x00', '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']
Run Code Online (Sandbox Code Playgroud)

怎么Line 2,进去\x000\x000...?这个编码格式是什么?如何将其转换为正确的格式?

编辑1:

print([hex(i) for i in RxedData])
Run Code Online (Sandbox Code Playgroud)

输出:

['0x2a', '0x0', '0x2c', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x31', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x31', '0x0', '0x3a', '0x0', '0x35', '0x0', '0x31', '0x0', '0x3a', '0x0', '0x35', '0x0', '0x30', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x33', '0x0', '0x2f', '0x0', '0x30', '0x0', '0x32', '0x0', '0x2f', '0x0', '0x32', '0x0', '0x30', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x32', '0x0', '0x2e', '0x0', '0x33', '0x0', '0x34', '0x0', '0x35', '0x0', '0x2c', '0x0', '0x4b', '0x0', '0x50', '0x0', '0x41', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x32', '0x0', '0x33', '0x0', '0x34', '0x0', '0x35', '0x0', '0x2e', '0x0', '0x36', '0x0', '0x36', '0x0', '0x2c', '0x0', '0x53', '0x0', '0x4c', '0x0', '0x50', '0x0', '0x48', '0x0', '0x2c', '0x0', '0x0', '0x0', '0x0', '0x0', '0x2c', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2c', '0x0', '0x24', '0x0', '0xa']
Run Code Online (Sandbox Code Playgroud)

Ser*_*sta 5

好的,从接收到的字节的十六进制转储来看,每个ASCII字符后面都跟着一个 NULL 字节 ( \\x00)。这只是字符的 UTF-16-LE 表示形式。UTF-8 解码仅保留初始字节的代码点,因为所有字节都低于 128,留下所有交错空值。而且您不能简单地将字节字符串解码为 UTF-16(它确实是什么),因为您是通过 a 获得的,readline它在换行符之后停止,并且尚未读取接下来的空字符。

\n\n

如果您可以读取另一行,它可能会以空字符开头,使该行显示为 UTF-16-BE 编码...

\n\n

那能做什么呢?

\n\n

一个简单的解决方法就是删除空字符。如果你可以确定你只会得到纯 ASCII 字符(没有重音符号,比如\xc3\xa9、没有表情符号、没有希腊语或西里尔语等),这就足够了:

\n\n
     RxedData = port.readline()\n     line = RxedData.replace(b\'\\x00\', b\'\').decode(\'ascii\')\n     print("Line 1: ", line)\n     row = line.split(\',\')[1:-1]\n     print("Line 2: ", row)\n
Run Code Online (Sandbox Code Playgroud)\n\n

使用该值:[\'0x2a\', \'0x0\', \'0x2c\', \'0x0\', \'0x30\', \'0x0\', \'0x30\', \'0x0\', \'0x30\', \'0x0\', \'0x30\', \'0x0\', \'0x30\', \'0x0\', \'0x30\', \'0x0\', \'0x30\', \'0x0\', \'0x31\', \'0x0\', \'0x2c\', \'0x0\', \'0x31\', \'0x0\', \'0x31\', \'0x0\', \'0x3a\', \'0x0\', \'0x35\', \'0x0\', \'0x31\', \'0x0\', \'0x3a\', \'0x0\', \'0x35\', \'0x0\', \'0x30\', \'0x0\', \'0x2c\', \'0x0\', \'0x31\', \'0x0\', \'0x33\', \'0x0\', \'0x2f\', \'0x0\', \'0x30\', \'0x0\', \'0x32\', \'0x0\', \'0x2f\', \'0x0\', \'0x32\', \'0x0\', \'0x30\', \'0x0\', \'0x2c\', \'0x0\', \'0x31\', \'0x0\', \'0x32\', \'0x0\', \'0x2e\', \'0x0\', \'0x33\', \'0x0\', \'0x34\', \'0x0\', \'0x35\', \'0x0\', \'0x2c\', \'0x0\', \'0x4b\', \'0x0\', \'0x50\', \'0x0\', \'0x41\', \'0x0\', \'0x2c\', \'0x0\', \'0x31\', \'0x0\', \'0x32\', \'0x0\', \'0x33\', \'0x0\', \'0x34\', \'0x0\', \'0x35\', \'0x0\', \'0x2e\', \'0x0\', \'0x36\', \'0x0\', \'0x36\', \'0x0\', \'0x2c\', \'0x0\', \'0x53\', \'0x0\', \'0x4c\', \'0x0\', \'0x50\', \'0x0\', \'0x48\', \'0x0\', \'0x2c\', \'0x0\', \'0x0\', \'0x0\', \'0x0\', \'0x0\', \'0x2c\', \'0x0\', \'0x2d\', \'0x0\', \'0x2d\', \'0x0\', \'0x2d\', \'0x0\', \'0x2d\', \'0x0\', \'0x2d\', \'0x0\', \'0x2d\', \'0x0\', \'0x2d\', \'0x0\', \'0x2d\', \'0x0\', \'0x2c\', \'0x0\', \'0x24\', \'0x0\', \'0xa\'],您应该获得:

\n\n
     RxedData = port.readline()\n     line = RxedData.replace(b\'\\x00\', b\'\').decode(\'ascii\')\n     print("Line 1: ", line)\n     row = line.split(\',\')[1:-1]\n     print("Line 2: ", row)\n
Run Code Online (Sandbox Code Playgroud)\n\n

它的优点是简单且强大,只要您只有纯 ASCII

\n\n
\n\n

编码一致等待将在串行端口周围使用 TextIOWrapper,并在其中指定 UTF-16-LE 编码。我无法测试它(我的盒子上没有序列号,不需要它),所以只能猜测应该做什么。

\n\n
COM_Port = serial.Serial(COM_PortName)\nwith io.TextIOWrapper(io.BufferedRWPair(COM_Port, COM_Port), encoding = \'utf-16-le\') as port:\n    while True:\n         line = port.readline()\n         print("Line 1: ", line)\n         row = line.split(\',\')[1:-1]\n         print("Line 2: ", row)\n
Run Code Online (Sandbox Code Playgroud)\n\n

在这里,TextIOWrapper 将处理换行字节后面的空字节,并直接为您提供真正的 unicode 字符串。

\n