sho*_*app 5 python performance
对于非常大的字符串(跨越多行),使用Python的内置字符串搜索或拆分大字符串(可能打开\n
)并迭代搜索较小的字符串会更快吗?
例如,对于非常大的字符串:
for l in get_mother_of_all_strings().split('\n'):
if 'target' in l:
return True
return False
Run Code Online (Sandbox Code Playgroud)
要么
return 'target' in get_mother_of_all_strings()
Run Code Online (Sandbox Code Playgroud)
Gar*_*Jax 13
大概当然第二,我认为在大字符串中搜索或在小字符串中搜索时没有任何区别.由于线条较短,你可以跳过一些字符,但是分割操作也有成本(搜索\n
,创建不同的字符串,创建列表),循环在python中完成.
字符串__contain__
方法在C中实现,因此明显更快.
还要考虑第二个方法在找到第一个匹配后立即中止,但第一个方法在开始在其中搜索之前拆分所有字符串.
通过简单的基准测试可以快速证明这一点
import timeit
prepare = """
with open('bible.txt') as fh:
text = fh.read()
"""
presplit_prepare = """
with open('bible.txt') as fh:
text = fh.read()
lines = text.split('\\n')
"""
longsearch = """
'hello' in text
"""
splitsearch = """
for line in text.split('\\n'):
if 'hello' in line:
break
"""
presplitsearch = """
for line in lines:
if 'hello' in line:
break
"""
benchmark = timeit.Timer(longsearch, prepare)
print "IN on big string takes:", benchmark.timeit(1000), "seconds"
benchmark = timeit.Timer(splitsearch, prepare)
print "IN on splitted string takes:", benchmark.timeit(1000), "seconds"
benchmark = timeit.Timer(presplitsearch, presplit_prepare)
print "IN on pre-splitted string takes:", benchmark.timeit(1000), "seconds"
Run Code Online (Sandbox Code Playgroud)
结果是:
IN on big string takes: 4.27126097679 seconds
IN on splitted string takes: 35.9622690678 seconds
IN on pre-splitted string takes: 11.815297842 seconds
Run Code Online (Sandbox Code Playgroud)
bible.txt文件实际上是圣经,我在这里找到它:http://patriot.net/~bmcgin/kjvpage.html (文本版)