如何使用python删除扩展的ascii？

Question

在尝试修复PML(Palm标记语言)文件时,似乎我的测试文件具有非ASCII字符,这导致MakeBook投诉.解决方案是去除PML中的所有非ASCII字符.

所以在试图在python中解决这个问题时,我有

import unicodedata, fileinput

for line in fileinput.input():
    print unicodedata.normalize('NFKD', line).encode('ascii','ignore')

但是,这会导致行必须为"unicode,而不是str"的错误.这是一个文件片段.

\B1a\B \tintense, disordered and often destructive rage†.†.†.\t

不太确定如何在此时正确传递线路进行处理.

Answer 1

尝试print line.decode('iso-8859-1').encode('ascii', 'ignore')- 这应该更接近你想要的.

Answer 2

在 Python 中读取文件时，您将获得字节字符串，在 Python 2.x 及更早版本中也称为“str”。您需要使用该方法将它们转换为“unicode”类型decode。例如：

line = line.decode('latin1')

将“latin1”替换为正确的编码。