尝试使用 chardet 猜测文件的编码

Edu*_*ira 3 python encoding character-encoding python-3.x chardet

我正在编写一个处理 CSV 文件的程序。这些文件可以具有特定的编码。我正在尝试合并一个过程来尝试猜测用户想要使用 chardet 打开的文件的编码。

\n\n

我正在尝试使用以下代码:

\n\n
rawdata = open(\'file.csv\', "r").read()\nresult = chardet.detect(rawdata)\n
Run Code Online (Sandbox Code Playgroud)\n\n

但我收到以下例外:

\n\n
/usr/lib/python3.5/site-packages/chardet/__init__.py in detect(aBuf)\n     23     if ((version_info < (3, 0) and isinstance(aBuf, unicode)) or\n     24             (version_info >= (3, 0) and not isinstance(aBuf, bytes))):\n---> 25         raise ValueError(\'Expected a bytes object, not a unicode object\')\n     26 \n     27     from . import universaldetector\n\nValueError: Expected a bytes object, not a unicode object\n
Run Code Online (Sandbox Code Playgroud)\n\n

我也尝试过:

\n\n
result = chardet.detect(bytes(rawdata))\n
Run Code Online (Sandbox Code Playgroud)\n\n

但得到:

\n\n
TypeError                                 Traceback (most recent call last)\n<ipython-input-47-1137b0adb486> in <module>()\n----> 1 result = chardet.detect(bytes(rawdata))\n\nTypeError: string argument without an encoding\n
Run Code Online (Sandbox Code Playgroud)\n\n

这是我尝试打开的文件的一部分:

\n\n
rawdata\n\n\'\xc3\xbeFILEPATH\xc3\xbe\\x14\xc3\xbeid\xc3\xbe\\x14\xc3\xbedocid\xc3\xbe\\x14\xc3\xbeBEGBATES\xc3\xbe\\x14\xc3\xbeENDBATES\xc3\xbe\\x14\xc3\xbeBEGATTACHID\xc3\xbe\\x14\xc3\xbeENDATTACHID\xc3\xbe\\x14\xc3\xbeCUSTODIAN\xc3\xbe\\x14\xc3\xbeRECIPIENT\xc3\xbe\\x14\xc3\xbeFROM\xc3\xbe\\x14\xc3\xbeCC\xc3\xbe\\x14\xc3\xbeBCC\xc3\xbe\\x14\xc3\xbeDATESENT\xc3\xbe\\x14\xc3\xbeTIMESENT\xc3\xbe\\x14\xc3\xbeSUBJECT\xc3\xbe\\x14\xc3\xbeDATERCVD\xc3\xbe\\x14\xc3\xbeTIMERCVD\xc3\xbe\\x14\xc3\xbeMESSAGEID\xc3\xbe\\x14\xc3\xbePARENTID\xc3\xbe\\x14\xc3\xbeCREATEDATE\xc3\xbe\\x14\xc3\xbeCREATETIME\xc3\xbe\\x14\xc3\xbeMODDATE\xc3\xbe\\x14\xc3\xbeMODTIME\xc3\xbe\\x14\xc3\xbeLASTACCDATE\xc3\xbe\\x14\xc3\xbeLASTACCTIME\xc3\xbe\\x14\xc3\xbeFILESIZE\xc3\xbe\\x14\xc3\xbeNATIVELINK\xc3\xbe\\x14\xc3\xbeMD5HASH\xc3\xbe\\x14\xc3\xbeSHA1HASH\xc3\xbe\\x14\xc3\xbeFILENAME\xc3\xbe\\x14\xc3\xbeFILEEXTENS\xc3\xbe\\x14\xc3\xbeTEXTPATH2\xc3\xbe\\x14\xc3\xbePSTNAME\xc3\xbe\\x14\xc3\xbeMSGFILETYP\xc3\xbe\\x14\xc3\xbeMIMETYP\xc3\xbe\\x14\xc3\xbeISNIST\xc3\xbe\\x14\xc3\xbeFILESIZE\xc3\xbe\\x14\xc3\xbeHASATTACH\xc3\xbe\\x14\xc3\xbeATTRIBUTES\xc3\xbe\\x14\xc3\xbePRIORITY\xc3\xbe\\x14\xc3\xbeSENSITIVITY\xc3\xbe\\x14\xc3\xbeIMPORTANCE\xc3\xbe\\x14\xc3\xbeISPRIVATE\xc3\xbe\\x14\xc3\xbeBUSYSTAT\xc3\xbe\\x14\xc3\xbeMSGFILETYP\xc3\xbe\\x14\xc3\xbeMSGFLAGS\xc3\xbe\\x14\xc3\xbeKEYWORDS\xc3\xbe\\x14\xc3\xbeCATEGORIES\xc3\xbe\\x14\xc3\xbeMSGFILETYP\xc3\xbe\\x14\xc3\xbeAUTHOR\xc3\xbe\\x14\xc3\xbeATTACHLIST\xc3\xbe\\x14\xc3\xbeFROMDOMAIN\xc3\xbe\\x14\xc3\xbeTODOMAIN\xc3\xbe\\x14\xc3\xbeMTGWHERE\xc3\xbe\\x14\xc3\xbeMTGWHEN\xc3\xbe\\x14\xc3\xbeMGTSTARTDATE\xc3\xbe\\x14\xc3\xbeMTGSTARTTIME\xc3\xbe\\x14\xc3\xbeMTGENDDATE\xc3\xbe\\x14\xc3\xbeMTGENDTIME\xc3\xbe\\x14\xc3\xbeMTGDUR\xc3\xbe\\x14\xc3\xbeMTGZONE\xc3\xbe\\x14\xc3\xbeREMINDDATE\xc3\xbe\\x14\xc3\xbeREMINDTIME\xc3\xbe\n
Run Code Online (Sandbox Code Playgroud)\n

sor*_*rin 5

首先将数据读取为二进制怎么样?

rawdata = open('file.csv', "rb").read()
Run Code Online (Sandbox Code Playgroud)