我写了一个简单的Python脚本来将中文标点符号翻译成英文.
import codecs, sys
def trcn():
tr = lambda x: x.translate(str.maketrans("""?????????????????“”‘’????…—×""", """,.!?;:,()[][][][]""''<>~$^-*"""))
out = codecs.getwriter('utf-8')(sys.stdout)
for line in sys.stdin:
out.write(tr(line))
if __name__ == '__main__':
if not len(sys.argv) == 1:
print("usage:\n\t{0} STDIN STDOUT".format(sys.argv[0]))
sys.exit(-1)
trcn()
sys.exit(0)
Run Code Online (Sandbox Code Playgroud)
但是UNICODE出了点问题.我无法通过它.错误消息:
Traceback (most recent call last):
File "trcn.py", line 13, in <module>
trcn()
File "trcn.py", line 7, in trcn
out.write(tr(line))
File "C:\Python31\Lib\codecs.py", line 356, in write
self.stream.write(data)
TypeError: must be str, not bytes
Run Code Online (Sandbox Code Playgroud)
之后,我在IDLE和Console中测试out.write().他们产生了不同的结果 我不知道为什么.
在IDLE
Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import sys,codecs
>>> out = codecs.getwriter('utf-8')(sys.stdout)
>>> out.write('hello')
hello
>>>
Run Code Online (Sandbox Code Playgroud)
在控制台中
Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys,codecs
>>> out = codecs.getwriter('utf-8')(sys.stdout)
>>> out.write('hello')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python31\Lib\codecs.py", line 356, in write
self.stream.write(data)
TypeError: must be str, not bytes
>>>
Run Code Online (Sandbox Code Playgroud)
平台:Windows XP EN
您的编码输出以字节形式出现在编码器中,因此必须传递给sys.stdout.buffer:
out = codecs.getwriter('utf-8')(sys.stdout.buffer)
Run Code Online (Sandbox Code Playgroud)
我不完全确定为什么你的代码在IDLE和控制台中的行为不同,但上面的内容可能有所帮助.也许IDLE sys.stdout实际上需要字节而不是字符(希望它有一个.buffer也需要字节).