UTF-8在Python日志中,如何?

Ted*_*uba 44 python unicode logging

我正在尝试使用Python的日志包将UTF-8编码的字符串记录到文件中.作为玩具示例:

import logging

def logging_test():
    handler = logging.FileHandler("/home/ted/logfile.txt", "w",
                                  encoding = "UTF-8")
    formatter = logging.Formatter("%(message)s")
    handler.setFormatter(formatter)
    root_logger = logging.getLogger()
    root_logger.addHandler(handler)
    root_logger.setLevel(logging.INFO)

    # This is an o with a hat on it.
    byte_string = '\xc3\xb4'
    unicode_string = unicode("\xc3\xb4", "utf-8")

    print "printed unicode object: %s" % unicode_string

    # Explode
    root_logger.info(unicode_string)

if __name__ == "__main__":
    logging_test()
Run Code Online (Sandbox Code Playgroud)

这会在logging.info()调用中与UnicodeDecodeError一起爆炸.

在较低级别,Python的日志包使用编解码器包打开日志文件,传递"UTF-8"参数作为编码.这一切都很好,但它试图将字节字符串写入文件而不是unicode对象,这会爆炸.从本质上讲,Python正在这样做:

file_handler.write(unicode_string.encode("UTF-8"))
Run Code Online (Sandbox Code Playgroud)

什么时候应该这样做:

file_handler.write(unicode_string)
Run Code Online (Sandbox Code Playgroud)

这是Python中的一个错误,还是我正在服用疯狂的药丸?FWIW,这是一个库存Python 2.6安装.

war*_*iuc 28

代码如下:

raise Exception(u'?')
Run Code Online (Sandbox Code Playgroud)

产生的原因:

  File "/usr/lib/python2.7/logging/__init__.py", line 467, in format
    s = self._fmt % record.__dict__
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
Run Code Online (Sandbox Code Playgroud)

发生这种情况是因为格式字符串是字节字符串,而某些格式字符串参数是具有非ASCII字符的unicode字符串:

>>> "%(message)s" % {'message': Exception(u'\u0449')}
*** UnicodeEncodeError: 'ascii' codec can't encode character u'\u0449' in position 0: ordinal not in range(128)
Run Code Online (Sandbox Code Playgroud)

使格式字符串unicode修复问题:

>>> u"%(message)s" % {'message': Exception(u'\u0449')}
u'\u0449'
Run Code Online (Sandbox Code Playgroud)

因此,在您的日志记录配置中,使所有格式字符串为unicode:

'formatters': {
    'simple': {
        'format': u'%(asctime)-s %(levelname)s [%(name)s]: %(message)s',
        'datefmt': '%Y-%m-%d %H:%M:%S',
    },
 ...
Run Code Online (Sandbox Code Playgroud)

并修补默认logging格式化程序以使用unicode格式字符串:

logging._defaultFormatter = logging.Formatter(u"%(message)s")
Run Code Online (Sandbox Code Playgroud)

  • 那么Python 3.5呢?默认情况下,所有字符串都不应该是unicode吗? (6认同)
  • @JanuszSkonieczny我的代码是`import locale; if locale.getpreferredencoding().upper()!='UTF-8':locale.setlocale(locale.LC_ALL,'en_US.UTF-8')` (4认同)
  • 是的,我在Docker容器中做了。我通过设置一堆连接到os编码的env变量来解决它。对于在这里遇到相同问题的任何人,请参阅http://stackoverflow.com/a/27931669/260480。 (2认同)

Vin*_*jip 16

检查您是否拥有最新的Python 2.6 - 自2.6发布以来,发现并修复了一些Unicode错误.例如,在我的Ubuntu Jaunty系统上,我运行了复制和粘贴的脚本,只删除了日志文件名中的'/ home/ted /'前缀.结果(从终端窗口复制并粘贴):

vinay@eta-jaunty:~/projects/scratch$ python --version
Python 2.6.2
vinay@eta-jaunty:~/projects/scratch$ python utest.py 
printed unicode object: ô
vinay@eta-jaunty:~/projects/scratch$ cat logfile.txt 
ô
vinay@eta-jaunty:~/projects/scratch$ 

在Windows框中:

C:\temp>python --version
Python 2.6.2

C:\temp>python utest.py
printed unicode object: ô

以及文件的内容:

替代文字

这也可以解释为什么Lennart Regebro也无法重现它.

  • 是的,它是 - 它发生在2.6.1和2.6.2之间,修订版69448:http://svn.python.org/view?view = rev&change = 69448 - 所以你需要升级到更高版本. (3认同)

Chr*_*ris 8

我在 Python3 中运行 Django 时遇到了类似的问题:我的记录器在遇到一些元音变音 (äöüß) 时就死了,但其他方面都很好。我查看了很多结果,发现没有任何效果。我试过

import locale; 
if locale.getpreferredencoding().upper() != 'UTF-8': 
    locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') 
Run Code Online (Sandbox Code Playgroud)

这是我从上面的评论中得到的。这没用。查看当前的语言环境给了我一些疯狂的 ANSI 东西,结果证明基本上只是“ASCII”。这让我完全走错了方向。

Changing the logging format-strings to Unicode would not help. Setting a magic encoding comment at the beginning of the script would not help. Setting the charset on the sender's message (the text came from a HTTP-reqeust) did not help.

What DID work was setting the encoding on the file-handler to UTF-8 in settings.py. Because I had nothing set, the default would become None. Which apparently ends up being ASCII (or as I'd like to think about: ASS-KEY)

    'handlers': {
        'file': {
            'level': 'DEBUG',
            'class': 'logging.handlers.TimedRotatingFileHandler',
            'encoding': 'UTF-8', # <-- That was missing.
            ....
        },
    },
Run Code Online (Sandbox Code Playgroud)


Eph*_*hie 5

我有点晚了,但我刚刚看到这篇文章,它使我能够非常轻松地设置登录 utf-8

这是帖子的链接

或这里的代码:

root_logger= logging.getLogger()
root_logger.setLevel(logging.DEBUG) # or whatever
handler = logging.FileHandler('test.log', 'w', 'utf-8') # or whatever
formatter = logging.Formatter('%(name)s %(message)s') # or whatever
handler.setFormatter(formatter) # Pass handler as a parameter, not assign
root_logger.addHandler(handler)
Run Code Online (Sandbox Code Playgroud)