Ted*_*uba 44 python unicode logging
我正在尝试使用Python的日志包将UTF-8编码的字符串记录到文件中.作为玩具示例:
import logging
def logging_test():
handler = logging.FileHandler("/home/ted/logfile.txt", "w",
encoding = "UTF-8")
formatter = logging.Formatter("%(message)s")
handler.setFormatter(formatter)
root_logger = logging.getLogger()
root_logger.addHandler(handler)
root_logger.setLevel(logging.INFO)
# This is an o with a hat on it.
byte_string = '\xc3\xb4'
unicode_string = unicode("\xc3\xb4", "utf-8")
print "printed unicode object: %s" % unicode_string
# Explode
root_logger.info(unicode_string)
if __name__ == "__main__":
logging_test()
Run Code Online (Sandbox Code Playgroud)
这会在logging.info()调用中与UnicodeDecodeError一起爆炸.
在较低级别,Python的日志包使用编解码器包打开日志文件,传递"UTF-8"参数作为编码.这一切都很好,但它试图将字节字符串写入文件而不是unicode对象,这会爆炸.从本质上讲,Python正在这样做:
file_handler.write(unicode_string.encode("UTF-8"))
Run Code Online (Sandbox Code Playgroud)
什么时候应该这样做:
file_handler.write(unicode_string)
Run Code Online (Sandbox Code Playgroud)
这是Python中的一个错误,还是我正在服用疯狂的药丸?FWIW,这是一个库存Python 2.6安装.
war*_*iuc 28
代码如下:
raise Exception(u'?')
Run Code Online (Sandbox Code Playgroud)
产生的原因:
File "/usr/lib/python2.7/logging/__init__.py", line 467, in format
s = self._fmt % record.__dict__
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
Run Code Online (Sandbox Code Playgroud)
发生这种情况是因为格式字符串是字节字符串,而某些格式字符串参数是具有非ASCII字符的unicode字符串:
>>> "%(message)s" % {'message': Exception(u'\u0449')}
*** UnicodeEncodeError: 'ascii' codec can't encode character u'\u0449' in position 0: ordinal not in range(128)
Run Code Online (Sandbox Code Playgroud)
使格式字符串unicode修复问题:
>>> u"%(message)s" % {'message': Exception(u'\u0449')}
u'\u0449'
Run Code Online (Sandbox Code Playgroud)
因此,在您的日志记录配置中,使所有格式字符串为unicode:
'formatters': {
'simple': {
'format': u'%(asctime)-s %(levelname)s [%(name)s]: %(message)s',
'datefmt': '%Y-%m-%d %H:%M:%S',
},
...
Run Code Online (Sandbox Code Playgroud)
并修补默认logging格式化程序以使用unicode格式字符串:
logging._defaultFormatter = logging.Formatter(u"%(message)s")
Run Code Online (Sandbox Code Playgroud)
Vin*_*jip 16
检查您是否拥有最新的Python 2.6 - 自2.6发布以来,发现并修复了一些Unicode错误.例如,在我的Ubuntu Jaunty系统上,我运行了复制和粘贴的脚本,只删除了日志文件名中的'/ home/ted /'前缀.结果(从终端窗口复制并粘贴):
vinay@eta-jaunty:~/projects/scratch$ python --version Python 2.6.2 vinay@eta-jaunty:~/projects/scratch$ python utest.py printed unicode object: ô vinay@eta-jaunty:~/projects/scratch$ cat logfile.txt ô vinay@eta-jaunty:~/projects/scratch$
在Windows框中:
C:\temp>python --version Python 2.6.2 C:\temp>python utest.py printed unicode object: ô
以及文件的内容:

这也可以解释为什么Lennart Regebro也无法重现它.
我在 Python3 中运行 Django 时遇到了类似的问题:我的记录器在遇到一些元音变音 (äöüß) 时就死了,但其他方面都很好。我查看了很多结果,发现没有任何效果。我试过
import locale;
if locale.getpreferredencoding().upper() != 'UTF-8':
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
Run Code Online (Sandbox Code Playgroud)
这是我从上面的评论中得到的。这没用。查看当前的语言环境给了我一些疯狂的 ANSI 东西,结果证明基本上只是“ASCII”。这让我完全走错了方向。
Changing the logging format-strings to Unicode would not help. Setting a magic encoding comment at the beginning of the script would not help. Setting the charset on the sender's message (the text came from a HTTP-reqeust) did not help.
What DID work was setting the encoding on the file-handler to UTF-8 in settings.py. Because I had nothing set, the default would become None. Which apparently ends up being ASCII (or as I'd like to think about: ASS-KEY)
'handlers': {
'file': {
'level': 'DEBUG',
'class': 'logging.handlers.TimedRotatingFileHandler',
'encoding': 'UTF-8', # <-- That was missing.
....
},
},
Run Code Online (Sandbox Code Playgroud)
我有点晚了,但我刚刚看到这篇文章,它使我能够非常轻松地设置登录 utf-8
或这里的代码:
root_logger= logging.getLogger()
root_logger.setLevel(logging.DEBUG) # or whatever
handler = logging.FileHandler('test.log', 'w', 'utf-8') # or whatever
formatter = logging.Formatter('%(name)s %(message)s') # or whatever
handler.setFormatter(formatter) # Pass handler as a parameter, not assign
root_logger.addHandler(handler)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
39597 次 |
| 最近记录: |