Nor*_*sUp 1323 python performance logging string-formatting
Python 2.6引入了该str.format()
方法,其语法与现有%
运算符略有不同.哪种情况更好,哪种情况更好?
以下使用每种方法并具有相同的结果,那么有什么区别?
#!/usr/bin/python
sub1 = "python string!"
sub2 = "an arg"
a = "i am a %s" % sub1
b = "i am a {0}".format(sub1)
c = "with %(kwarg)s!" % {'kwarg':sub2}
d = "with {kwarg}!".format(kwarg=sub2)
print a # "i am a python string!"
print b # "i am a python string!"
print c # "with an arg!"
print d # "with an arg!"
Run Code Online (Sandbox Code Playgroud)此外,何时在Python中发生字符串格式化?例如,如果我的日志记录级别设置为HIGH,我仍然会执行以下%
操作吗?如果是这样,有没有办法避免这种情况?
log.debug("some debug info: %s" % some_info)
Run Code Online (Sandbox Code Playgroud)Cla*_*diu 940
回答你的第一个问题...... .format
在许多方面似乎更复杂.令人讨厌的%
是它如何能够采用变量或元组.您认为以下内容始终有效:
"hi there %s" % name
Run Code Online (Sandbox Code Playgroud)
然而,如果name
恰好是(1, 2, 3)
,它会抛出一个TypeError
.为了保证它始终打印,您需要这样做
"hi there %s" % (name,) # supply the single argument as a single-item tuple
Run Code Online (Sandbox Code Playgroud)
这只是丑陋的..format
没有那些问题.同样在你给出的第二个例子中,这个.format
例子看起来更清晰.
你为什么不用它?
要回答第二个问题,字符串格式化与任何其他操作同时发生 - 评估字符串格式化表达式时.并且Python不是一种懒惰的语言,在调用函数之前会对表达式求值,所以在你的log.debug
例子中,表达式"some debug info: %s"%some_info
将首先求值,例如"some debug info: roflcopters are active"
,然后将该字符串传递给log.debug()
.
eyq*_*uem 303
模数运算符(%)不能做的事情,afaik:
tu = (12,45,22222,103,6)
print '{0} {2} {1} {2} {3} {2} {4} {2}'.format(*tu)
Run Code Online (Sandbox Code Playgroud)
结果
12 22222 45 22222 103 22222 6 22222
Run Code Online (Sandbox Code Playgroud)
很有用.
另一点:format()
作为一个函数,可以在其他函数中用作参数:
li = [12,45,78,784,2,69,1254,4785,984]
print map('the number is {}'.format,li)
print
from datetime import datetime,timedelta
once_upon_a_time = datetime(2010, 7, 1, 12, 0, 0)
delta = timedelta(days=13, hours=8, minutes=20)
gen =(once_upon_a_time +x*delta for x in xrange(20))
print '\n'.join(map('{:%Y-%m-%d %H:%M:%S}'.format, gen))
Run Code Online (Sandbox Code Playgroud)
结果是:
['the number is 12', 'the number is 45', 'the number is 78', 'the number is 784', 'the number is 2', 'the number is 69', 'the number is 1254', 'the number is 4785', 'the number is 984']
2010-07-01 12:00:00
2010-07-14 20:20:00
2010-07-28 04:40:00
2010-08-10 13:00:00
2010-08-23 21:20:00
2010-09-06 05:40:00
2010-09-19 14:00:00
2010-10-02 22:20:00
2010-10-16 06:40:00
2010-10-29 15:00:00
2010-11-11 23:20:00
2010-11-25 07:40:00
2010-12-08 16:00:00
2010-12-22 00:20:00
2011-01-04 08:40:00
2011-01-17 17:00:00
2011-01-31 01:20:00
2011-02-13 09:40:00
2011-02-26 18:00:00
2011-03-12 02:20:00
Run Code Online (Sandbox Code Playgroud)
Woo*_*ble 142
假设您正在使用Python的logging
模块,您可以将字符串格式化参数作为参数传递给.debug()
方法,而不是自己进行格式化:
log.debug("some debug info: %s", some_info)
Run Code Online (Sandbox Code Playgroud)
这避免了格式化,除非记录器实际记录的东西.
Col*_*nic 116
从Python 3.6(2016)开始,您可以使用f-strings替换变量:
>>> origin = "London"
>>> destination = "Paris"
>>> f"from {origin} to {destination}"
'from London to Paris'
Run Code Online (Sandbox Code Playgroud)
注意f"
前缀.如果你在Python 3.5或更早版本中尝试这个,你会得到一个SyntaxError
.
请参阅https://docs.python.org/3.6/reference/lexical_analysis.html#f-strings
rsl*_*lnx 53
但是请小心,刚才我在尝试%
用.format
现有代码替换所有代码时发现了一个问题:'{}'.format(unicode_string)
将尝试编码unicode_string并且可能会失败.
只需看看这个Python交互式会话日志:
Python 2.7.2 (default, Aug 27 2012, 19:52:55)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
; s='?'
; u=u'?'
; s
'\xd0\xb9'
; u
u'\u0439'
Run Code Online (Sandbox Code Playgroud)
s
只是一个字符串(在Python3中称为"字节数组")并且u
是一个Unicode字符串(在Python3中称为"字符串"):
; '%s' % s
'\xd0\xb9'
; '%s' % u
u'\u0439'
Run Code Online (Sandbox Code Playgroud)
当您将Unicode对象作为参数提供给%
运算符时,即使原始字符串不是Unicode,它也会生成Unicode字符串:
; '{}'.format(s)
'\xd0\xb9'
; '{}'.format(u)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0439' in position 0: ordinal not in range(256)
Run Code Online (Sandbox Code Playgroud)
但该.format
函数将引发"UnicodeEncodeError":
; u'{}'.format(s)
u'\xd0\xb9'
; u'{}'.format(u)
u'\u0439'
Run Code Online (Sandbox Code Playgroud)
只有当原始字符串是Unicode时,才能使用Unicode参数.
; '{}'.format(u'i')
'i'
Run Code Online (Sandbox Code Playgroud)
或者如果参数字符串可以转换为字符串(所谓的'字节数组')
mat*_*asg 35
.format
(我在答案中没有看到)的另一个优点:它可以采用对象属性.
In [12]: class A(object):
....: def __init__(self, x, y):
....: self.x = x
....: self.y = y
....:
In [13]: a = A(2,3)
In [14]: 'x is {0.x}, y is {0.y}'.format(a)
Out[14]: 'x is 2, y is 3'
Run Code Online (Sandbox Code Playgroud)
或者,作为关键字参数:
In [15]: 'x is {a.x}, y is {a.y}'.format(a=a)
Out[15]: 'x is 2, y is 3'
Run Code Online (Sandbox Code Playgroud)
%
据我所知,这是不可能的.
bal*_*alu 31
正如我今天发现的那样,格式化字符串的旧方法%
不支持Decimal
,Python的十进制定点模块和浮点运算,开箱即用.
示例(使用Python 3.3.5):
#!/usr/bin/env python3
from decimal import *
getcontext().prec = 50
d = Decimal('3.12375239e-24') # no magic number, I rather produced it by banging my head on my keyboard
print('%.50f' % d)
print('{0:.50f}'.format(d))
Run Code Online (Sandbox Code Playgroud)
输出:
0.00000000000000000000000312375239000000009907464850 0.00000000000000000000000312375239000000000000000000
肯定可能有解决办法,但你仍然可以考虑立即使用该format()
方法.
lcl*_*ltj 29
%
比format
我的测试提供更好的性能.
测试代码:
Python 2.7.2:
import timeit
print 'format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')")
print '%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')")
Run Code Online (Sandbox Code Playgroud)
结果:
> format: 0.470329046249
> %: 0.357107877731
Run Code Online (Sandbox Code Playgroud)
Python 3.5.2
import timeit
print('format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')"))
print('%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')"))
Run Code Online (Sandbox Code Playgroud)
结果
> format: 0.5864730989560485
> %: 0.013593495357781649
Run Code Online (Sandbox Code Playgroud)
它看起来在Python2中,差异很小,而在Python3中,%
速度要快得多format
.
感谢@Chris Cogdon的示例代码.
小智 17
如果你的python> = 3.6,F字符串格式的文字是你的新朋友.
它更简单,更干净,性能更好.
In [1]: params=['Hello', 'adam', 42]
In [2]: %timeit "%s %s, the answer to everything is %d."%(params[0],params[1],params[2])
448 ns ± 1.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [3]: %timeit "{} {}, the answer to everything is {}.".format(*params)
449 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [4]: %timeit f"{params[0]} {params[1]}, the answer to everything is {params[2]}."
12.7 ns ± 0.0129 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
Run Code Online (Sandbox Code Playgroud)
Dav*_*ers 15
作为旁注,您不必为了使用日志记录的新样式格式而受到性能影响.您可以传递任何对象logging.debug
,logging.info
实现了等__str__
魔术方法.当日志记录模块确定它必须发出您的消息对象(无论它是什么)时,它会str(message_object)
在执行此操作之前调用它.所以你可以这样做:
import logging
class NewStyleLogMessage(object):
def __init__(self, message, *args, **kwargs):
self.message = message
self.args = args
self.kwargs = kwargs
def __str__(self):
args = (i() if callable(i) else i for i in self.args)
kwargs = dict((k, v() if callable(v) else v) for k, v in self.kwargs.items())
return self.message.format(*args, **kwargs)
N = NewStyleLogMessage
# Neither one of these messages are formatted (or calculated) until they're
# needed
# Emits "Lazily formatted log entry: 123 foo" in log
logging.debug(N('Lazily formatted log entry: {0} {keyword}', 123, keyword='foo'))
def expensive_func():
# Do something that takes a long time...
return 'foo'
# Emits "Expensive log entry: foo" in log
logging.debug(N('Expensive log entry: {keyword}', keyword=expensive_func))
Run Code Online (Sandbox Code Playgroud)
这些都在Python 3文档(https://docs.python.org/3/howto/logging-cookbook.html#formatting-styles)中描述.但是,它也适用于Python 2.6(https://docs.python.org/2.6/library/logging.html#using-arbitrary-objects-as-messages).
使用这种技术的一个优点,除了它的格式化风格不可知的事实,它允许惰性值,例如expensive_func
上面的函数.这为Python文档中提供的建议提供了更优雅的替代方案:https://docs.python.org/2.6/library/logging.html#optimization.
Jor*_*tao 10
%
可能有帮助的一种情况是在格式化正则表达式时.例如,
'{type_names} [a-z]{2}'.format(type_names='triangle|square')
Run Code Online (Sandbox Code Playgroud)
加油IndexError
.在这种情况下,您可以使用:
'%(type_names)s [a-z]{2}' % {'type_names': 'triangle|square'}
Run Code Online (Sandbox Code Playgroud)
这避免了将正则表达式编写为'{type_names} [a-z]{{2}}'
.当你有两个正则表达式时,这可能很有用,其中一个是单独使用而没有格式,但两者的串联都是格式化的.
我要补充一点,从3.6版开始,我们可以像下面这样使用fstrings
foo = "john"
bar = "smith"
print(f"My name is {foo} {bar}")
Run Code Online (Sandbox Code Playgroud)
哪个给
我叫约翰·史密斯
一切都转换为字符串
mylist = ["foo", "bar"]
print(f"mylist = {mylist}")
Run Code Online (Sandbox Code Playgroud)
结果:
mylist = ['foo','bar']
您可以像其他格式一样传递函数
print(f'Hello, here is the date : {time.strftime("%d/%m/%Y")}')
Run Code Online (Sandbox Code Playgroud)
举个例子
您好,这是日期:16/04/2018
小智 6
Python 3.6.7 对比:
#!/usr/bin/env python
import timeit
def time_it(fn):
"""
Measure time of execution of a function
"""
def wrapper(*args, **kwargs):
t0 = timeit.default_timer()
fn(*args, **kwargs)
t1 = timeit.default_timer()
print("{0:.10f} seconds".format(t1 - t0))
return wrapper
@time_it
def new_new_format(s):
print("new_new_format:", f"{s[0]} {s[1]} {s[2]} {s[3]} {s[4]}")
@time_it
def new_format(s):
print("new_format:", "{0} {1} {2} {3} {4}".format(*s))
@time_it
def old_format(s):
print("old_format:", "%s %s %s %s %s" % s)
def main():
samples = (("uno", "dos", "tres", "cuatro", "cinco"), (1,2,3,4,5), (1.1, 2.1, 3.1, 4.1, 5.1), ("uno", 2, 3.14, "cuatro", 5.5),)
for s in samples:
new_new_format(s)
new_format(s)
old_format(s)
print("-----")
if __name__ == '__main__':
main()
Run Code Online (Sandbox Code Playgroud)
输出:
new_new_format: uno dos tres cuatro cinco
0.0000170280 seconds
new_format: uno dos tres cuatro cinco
0.0000046750 seconds
old_format: uno dos tres cuatro cinco
0.0000034820 seconds
-----
new_new_format: 1 2 3 4 5
0.0000043980 seconds
new_format: 1 2 3 4 5
0.0000062590 seconds
old_format: 1 2 3 4 5
0.0000041730 seconds
-----
new_new_format: 1.1 2.1 3.1 4.1 5.1
0.0000092650 seconds
new_format: 1.1 2.1 3.1 4.1 5.1
0.0000055340 seconds
old_format: 1.1 2.1 3.1 4.1 5.1
0.0000052130 seconds
-----
new_new_format: uno 2 3.14 cuatro 5.5
0.0000053380 seconds
new_format: uno 2 3.14 cuatro 5.5
0.0000047570 seconds
old_format: uno 2 3.14 cuatro 5.5
0.0000045320 seconds
-----
Run Code Online (Sandbox Code Playgroud)
对于 python 版本 >= 3.6 (参见PEP 498)
s1='albha'
s2='beta'
f'{s1}{s2:>10}'
#output
'albha beta'
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
959655 次 |
最近记录: |