Python 3.4:str:AttributeError:'str'对象没有属性'decode

Question

Python 3.4:str:AttributeError:'str'对象没有属性'decode

Rom*_*omu 7 encoding bytestring python-3.x

我有一个函数的代码部分,从字符串中替换严重编码的外来字符:

s = "String from an old database with weird mixed encodings"
s = str(bytes(odbc_str.strip(), 'cp1252'))
s = s.replace('\\x82', 'é')
s = s.replace('\\x8a', 'è')
(...)
print(s)
# b"String from an old database with weird mixed encodings"

Run Code Online (Sandbox Code Playgroud)

我这里需要一个"真正的"字符串,而不是字节.但是,当我想解码它们时,我有一个例外:

s = "String from an old database with weird mixed encodings"
s = str(bytes(odbc_str.strip(), 'cp1252'))
s = s.replace('\\x82', 'é')
s = s.replace('\\x8a', 'è')
(...)
print(s.decode("utf-8"))
# AttributeError: 'str' object has no attribute 'decode'

Run Code Online (Sandbox Code Playgroud)

你知道为什么s是字节吗？
为什么我不能将它解码为真正的字符串？
你知道怎么做干净的方式吗？(今天我回到s [2:] [: - 1].工作但非常难看,我想了解这种行为)

提前致谢 !

编辑:

python3中的pypyodbc默认使用所有unicode.那让我困惑.在连接时,您可以告诉他使用ANSI.

con_odbc = pypyodbc.connect("DSN=GP", False, False, 0, False)

Run Code Online (Sandbox Code Playgroud)

然后,我可以将返回的东西转换为cp850,这是数据库的初始代码页.

str(odbc_str, "cp850", "replace")

Run Code Online (Sandbox Code Playgroud)

不再需要手动替换每个特殊字符.非常感谢pepr

Answer 1

pep*_*epr 4

打印出来的b"String from an old database with weird mixed encodings"并不是字符串内容的表示。它是字符串内容的值。由于您没有将编码参数传递给str()...（请参阅文档https://docs.python.org/3.4/library/stdtypes.html#str）

\n\n

\n
如果既没有给出编码也没有给出错误，str(object)则返回object.__str__()，它是对象的 \xe2\x80\x9cinformal\xe2\x80\x9d 或很好打印的字符串表示形式。对于字符串对象，这是字符串本身。如果对象没有__str__()方法，则str()返回 return repr(object)。
\n

\n\n

这就是你的案例中发生的情况。这b"实际上是字符串内容的一部分的两个字符。您还可以尝试：

\n\n

s1 = \'String from an old database with weird mixed encodings\'\nprint(type(s1), repr(s1))\nby = bytes(s1, \'cp1252\')\nprint(type(by), repr(by))\ns2 = str(by)\nprint(type(s2), repr(s2))\n

Run Code Online (Sandbox Code Playgroud)\n\n

它打印：

\n\n

<class \'str\'> \'String from an old database with weird mixed encodings\'\n<class \'bytes\'> b\'String from an old database with weird mixed encodings\'\n<class \'str\'> "b\'String from an old database with weird mixed encodings\'"\n

Run Code Online (Sandbox Code Playgroud)\n\n

s[2:][:-1]这就是为什么适合您的原因。

\n\n

如果您想更多，那么（在我看来）或者您想要从数据库获取bytes或bytearray从数据库（如果可能的话），并修复字节（请参阅 bytes.translate https://docs.python.org/3.4/library /stdtypes.html?highlight=translate#bytes.translate）或者您成功获取了字符串（幸运的是构造该字符串时没有异常），并且您想要用正确的字符替换错误的字符（另请参阅str.translate() https: //docs.python.org/3.4/library/stdtypes.html?highlight=translate#str.translate）。

\n\n

ODBC 内部可能使用了错误的编码。（也就是说数据库的内容可能是正确的，但它被 ODBC 误解了，并且您无法告诉 ODBC 什么是正确的编码。）然后您想使用错误的编码将字符串编码回字节，然后使用正确的编码对字节进行解码。

\n

归档时间：	11 年，1 月前
查看次数：	28053 次
最近记录：	11 年，1 月前