pcd*_*mmy 5 python stringio cstringio
我写了一个小基准,我比较了ZOCache的不同字符串连接方法.
所以它看起来像tempfile.TemporaryFile比其他任何东西都要快:
$ python src/ZOCache/tmp_benchmark.py
3.00407409668e-05 TemporaryFile
0.385630846024 SpooledTemporaryFile
0.299962997437 BufferedRandom
0.0849719047546 io.StringIO
0.113346099854 concat
Run Code Online (Sandbox Code Playgroud)
我一直在使用的基准代码:
#!/usr/bin/python
from __future__ import print_function
import io
import timeit
import tempfile
class Error(Exception):
pass
def bench_temporaryfile():
with tempfile.TemporaryFile(bufsize=10*1024*1024) as out:
for i in range(0, 100):
out.write(b"Value = ")
out.write(bytes(i))
out.write(b" ")
# Get string.
out.seek(0)
contents = out.read()
out.close()
# Test first letter.
if contents[0:5] != b"Value":
raise Error
def bench_spooledtemporaryfile():
with tempfile.SpooledTemporaryFile(max_size=10*1024*1024) as out:
for i in range(0, 100):
out.write(b"Value = ")
out.write(bytes(i))
out.write(b" ")
# Get string.
out.seek(0)
contents = out.read()
out.close()
# Test first letter.
if contents[0:5] != b"Value":
raise Error
def bench_BufferedRandom():
# 1. BufferedRandom
with io.open('out.bin', mode='w+b') as fp:
with io.BufferedRandom(fp, buffer_size=10*1024*1024) as out:
for i in range(0, 100):
out.write(b"Value = ")
out.write(bytes(i))
out.write(b" ")
# Get string.
out.seek(0)
contents = out.read()
# Test first letter.
if contents[0:5] != b'Value':
raise Error
def bench_stringIO():
# 1. Use StringIO.
out = io.StringIO()
for i in range(0, 100):
out.write(u"Value = ")
out.write(unicode(i))
out.write(u" ")
# Get string.
contents = out.getvalue()
out.close()
# Test first letter.
if contents[0] != 'V':
raise Error
def bench_concat():
# 2. Use string appends.
data = ""
for i in range(0, 100):
data += u"Value = "
data += unicode(i)
data += u" "
# Test first letter.
if data[0] != u'V':
raise Error
if __name__ == '__main__':
print(str(timeit.timeit('bench_temporaryfile()', setup="from __main__ import bench_temporaryfile", number=1000)) + " TemporaryFile")
print(str(timeit.timeit('bench_spooledtemporaryfile()', setup="from __main__ import bench_spooledtemporaryfile", number=1000)) + " SpooledTemporaryFile")
print(str(timeit.timeit('bench_BufferedRandom()', setup="from __main__ import bench_BufferedRandom", number=1000)) + " BufferedRandom")
print(str(timeit.timeit("bench_stringIO()", setup="from __main__ import bench_stringIO", number=1000)) + " io.StringIO")
print(str(timeit.timeit("bench_concat()", setup="from __main__ import bench_concat", number=1000)) + " concat")
Run Code Online (Sandbox Code Playgroud)
编辑Python3.4.3 + io.BytesIO
python3 ./src/ZOCache/tmp_benchmark.py
2.689500024644076e-05 TemporaryFile
0.30429405899985795 SpooledTemporaryFile
0.348170792000019 BufferedRandom
0.0764778530001422 io.BytesIO
0.05162201000030109 concat
Run Code Online (Sandbox Code Playgroud)
io.BytesIO的新来源:
#!/usr/bin/python3
from __future__ import print_function
import io
import timeit
import tempfile
class Error(Exception):
pass
def bench_temporaryfile():
with tempfile.TemporaryFile() as out:
for i in range(0, 100):
out.write(b"Value = ")
out.write(bytes(str(i), 'utf-8'))
out.write(b" ")
# Get string.
out.seek(0)
contents = out.read()
out.close()
# Test first letter.
if contents[0:5] != b"Value":
raise Error
def bench_spooledtemporaryfile():
with tempfile.SpooledTemporaryFile(max_size=10*1024*1024) as out:
for i in range(0, 100):
out.write(b"Value = ")
out.write(bytes(str(i), 'utf-8'))
out.write(b" ")
# Get string.
out.seek(0)
contents = out.read()
out.close()
# Test first letter.
if contents[0:5] != b"Value":
raise Error
def bench_BufferedRandom():
# 1. BufferedRandom
with io.open('out.bin', mode='w+b') as fp:
with io.BufferedRandom(fp, buffer_size=10*1024*1024) as out:
for i in range(0, 100):
out.write(b"Value = ")
out.write(bytes(i))
out.write(b" ")
# Get string.
out.seek(0)
contents = out.read()
# Test first letter.
if contents[0:5] != b'Value':
raise Error
def bench_BytesIO():
# 1. Use StringIO.
out = io.BytesIO()
for i in range(0, 100):
out.write(b"Value = ")
out.write(bytes(str(i), 'utf-8'))
out.write(b" ")
# Get string.
contents = out.getvalue()
out.close()
# Test first letter.
if contents[0:5] != b'Value':
raise Error
def bench_concat():
# 2. Use string appends.
data = ""
for i in range(0, 100):
data += "Value = "
data += str(i)
data += " "
# Test first letter.
if data[0] != 'V':
raise Error
if __name__ == '__main__':
print(str(timeit.timeit('bench_temporaryfile()', setup="from __main__ import bench_temporaryfile", number=1000)) + " TemporaryFile")
print(str(timeit.timeit('bench_spooledtemporaryfile()', setup="from __main__ import bench_spooledtemporaryfile", number=1000)) + " SpooledTemporaryFile")
print(str(timeit.timeit('bench_BufferedRandom()', setup="from __main__ import bench_BufferedRandom", number=1000)) + " BufferedRandom")
print(str(timeit.timeit("bench_BytesIO()", setup="from __main__ import bench_BytesIO", number=1000)) + " io.BytesIO")
print(str(timeit.timeit("bench_concat()", setup="from __main__ import bench_concat", number=1000)) + " concat")
Run Code Online (Sandbox Code Playgroud)
每个平台都是如此吗?如果是这样,为什么?
编辑:固定基准(和固定代码)的结果:
0.2675984420002351 TemporaryFile
0.28104681999866443 SpooledTemporaryFile
0.3555715570000757 BufferedRandom
0.10379689100045653 io.BytesIO
0.05650951399911719 concat
Run Code Online (Sandbox Code Playgroud)
你最大的问题:根据tdelaney,你从未真正参加过TemporaryFile测试; 你在timeit代码片段中省略了parens (仅用于该测试,其他实际运行).因此,您需要计算查找名称所需的时间bench_temporaryfile,但不要实际调用它.更改:
print(str(timeit.timeit('bench_temporaryfile', setup="from __main__ import bench_temporaryfile", number=1000)) + " TemporaryFile")
Run Code Online (Sandbox Code Playgroud)
至:
print(str(timeit.timeit('bench_temporaryfile()', setup="from __main__ import bench_temporaryfile", number=1000)) + " TemporaryFile")
Run Code Online (Sandbox Code Playgroud)
(添加parens以使其成为一个电话)来修复.
其他一些问题:
io.StringIO与您的其他测试用例根本不同.具体来说,您正在测试的所有其他类型都以二进制模式,读取和写入操作str,并避免行结束转换.io.StringIO使用Python 3样式字符串(unicode在Python 2中),您的测试通过使用不同的文字并转换为unicode而不是bytes.这增加了大量的编码和解码开销,以及使用更多的内存(对于相同的数据unicode使用2-4倍的内存str,这意味着更多的分配器开销,更多的复制开销等).
另一个主要区别是,你正在设置一个真正巨大bufsize的TemporaryFile; 需要进行少量系统调用,并且大多数写操作只是附加到缓冲区中的连续内存.相比之下,io.StringIO存储所写的单个值,并且只在您要求时将它们连接在一起getvalue().
另外,最后,你认为你使用bytes构造函数是向前兼容的,但你不是; 在Python 2中bytes是一个别名str,所以bytes(10)返回'10',但在Python 3中,bytes是一个完全不同的东西,并传递一个整数给它返回bytes该大小的零初始化对象,bytes(10)返回b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'.
如果你想要一个公平的测试用例,至少可以切换到cStringIO.StringIO或io.BytesIO代替io.StringIO并bytes统一写入.通常,您不会自己显式设置缓冲区大小TemporaryFile等,因此您可以考虑删除它.
在我自己使用Python 2.7.10的Linux x64测试中,使用ipython的%timeit魔力,排名为:
io.BytesIO 每回路约48μsio.StringIO每个循环约54μs(因此unicode开销不会增加太多)cStringIO.StringIO 每回路约83μsTemporaryFile每个循环~2.8 ms(注意单位; ms比μs长1000倍)而且这不会回到默认的缓冲区大小(我保留了bufsize您的测试中的显式).我怀疑行为TemporaryFile会有很大差异(取决于操作系统和临时文件的处理方式;某些系统可能只存储在内存中,其他系统可能存储/tmp,但当然,/tmp可能只是RAMdisk).
有些东西告诉我你可能有一个设置,TemporaryFile它基本上是一个普通的内存缓冲区,永远不会进入文件系统,我的最终可能最终会持久存储(如果只是短期); 在内存中发生的事情是可预测的,但是当你涉及文件系统(TemporaryFile可能,取决于操作系统,内核设置等)时,系统之间的行为会有很大差异.