python读取包含\ x0a的文件,而不是python中的\\ x0a

Question

我有xml文件,其中包含十六进制字符\ x0a.我想将它们转换为适当的unicode字符,如python中的\n.

每当我尝试读取文件时,它都会逃避反斜杠字符.

例如,我的文件内容是

get EtqLt5fwmRBE\x0a

然后在读取文件之后,字符串的表示就出现了

get EtqLt5fwmRBE\\x0a

但我想要的是转换\x0a为\n

\x0a文件中没有.还有其他角色.例如repr(),文件中的一行是

\\x7c12\\x7c5\\x7c\\x0a

上面的预期产出是

|12|5|

Answer 1

您可以通过string_escape(仅限Python 2,生成字节unicode_escape串)或(Python 2和3,生成unicode字符串)编解码器来运行文本.

如何应用这些取决于您的Python版本(2或3)以及输入是字节字符串(str在Python 2中,bytes在Python 3中)还是Unicode字符串(unicode在Python 2中,str在Python 3中).

Python 2,如果您有字节字符串或 unicode字符串,只需调用decode():

fixed = yourstring.decode('unicode_escape')

在Python 3中,bytestring.decode(...)如果有字节则使用.如果你有str,首先编码为Latin-1(unicode_escape将使用它来解码任何非ASCII代码点!):

fixed = yourstring.encode('latin1').decode('unicode_escape')

Python 2中的演示:

>>> '\\x7c12\\x7c5\\x7c\\x0a'.decode('unicode_escape')
u'|12|5|\n'
>>> u'\\x7c12\\x7c5\\x7c\\x0a'.decode('unicode_escape')
u'|12|5|\n'

在Python 3中:

>>> b'\\x7c12\\x7c5\\x7c\\x0a'.decode('unicode_escape')
'|12|5|\n'
>>> '\\x7c12\\x7c5\\x7c\\x0a'.encode('latin1').decode('unicode_escape')
'|12|5|\n'