为什么用universal_newlines打开子进程会导致unicode解码异常？

Question

为什么用universal_newlines打开子进程会导致unicode解码异常？

D.C*_*.C. 3 python unicode subprocess python-3.x

我正在使用 subprocess 模块来运行子作业，并使用 subprocess.PIPE 收集其输出和错误流。为了避免死锁，我在一个单独的线程上不断地从这些流中读取。这是有效的，除非有时程序由于解码问题而崩溃：

`UnicodeDecodeError：'ascii' 编解码器无法解码位置 483 中的字节 0xe2：序号不在范围内（128

在高层次上，我知道 Python 可能正在尝试使用 ASCII 编解码器转换为字符串，并且我需要在某处调用 decode，我只是不确定在哪里。创建子流程作业时，我将 Universal_newlines 指定为 True。我认为这意味着，将 stdout/stderr 返回为 unicode，而不是二进制：

self.p = subprocess.Popen(self.command, shell=self.shell, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)

Run Code Online (Sandbox Code Playgroud)

崩溃发生在我的阅读线程函数中：

def standardOutHandler(standardOut):
    # Crash happens on the following line:
    for line in iter(standardOut.readline, ''):
       writerLock.acquire()
       stdout_file.write(line)
       if self.echoOutput:
           sys.stdout.write(line)
           sys.stdout.flush()
       writerLock.release()

Run Code Online (Sandbox Code Playgroud)

不清楚为什么 readline 在这里抛出解码异常；正如我所说，我认为 Universal_newlines 为 true 已经返回了我的解码数据。

这里发生了什么，我能做些什么来纠正这个问题？

这是完整的追溯

Exception in thread Thread-5:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/threading.py", line 920, in _bootstrap_inner
self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/threading.py", line 868, in run
self._target(*self._args, **self._kwargs)
  File "/Users/lzrd/my_process.py", line 61, in standardOutHandler
for line in iter(standardOut.readline, ''):
  File "/Users/lzrd/Envs/my_env/bin/../lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 483: ordinal not in range(128)

Run Code Online (Sandbox Code Playgroud)

Answer 1

jfs*_*jfs 5

如果您使用，universal_newlines=True那么字节流将使用您系统上locale.getpreferredencoding(False)应有的字符编码utf-8（检查LANG、LC_CTYPE、LC_ALLenvvars）解码为 Unicode 。

如果异常仍然存在；用一个空的循环体试试你的代码：

for line in standardOut: #NOTE: no need to use iter() idiom here on Python 3
    pass

Run Code Online (Sandbox Code Playgroud)

如果您仍然遇到异常，那么它可能是 Python 中的一个错误，如果locale.getpreferredencoding(False)不是，ascii如果您在Popen()调用附近检查它的话——在这里使用完全相同的环境很重要。

我会理解如果UnicodeDecodeError显示utf-8而不是ascii. 在这种情况下，您可以尝试手动解码流：

#!/usr/bin/env python3
import io
import locale
from subprocess import Popen, PIPE

with Popen(['command', 'arg 1'], stdout=PIPE, bufsize=1) as p:
    for line in io.TextIOWrapper(p.stdout,
                                 encoding=locale.getpreferredencoding(False),
                                 errors='strict'): 
        print(line, end='')

Run Code Online (Sandbox Code Playgroud)

您可以在这里尝试使用encoding,errors参数，例如，设置encoding='ascii'或使用errors='namereplace'以使用\N{...}转义序列（用于调试）替换不受支持的字符（在给定的字符编码中）。

归档时间：	10 年，10 月前
查看次数：	4625 次
最近记录：	7 年，7 月前