字符串性能 - Windows 10与Ubuntu下的Python 2.7 vs Python 3.4

Question

字符串性能 - Windows 10与Ubuntu下的Python 2.7 vs Python 3.4

Max*_*ers 7 python performance python-2.7 python-3.4

用例
一个简单的函数,用于检查特定字符串是否位于3的倍数位置的另一个字符串中(参见此处的实际示例,在DNA序列中查找终止密码子).

函数
sliding_window:取一个长度为3的字符串将其与搜索字符串进行比较,如果它们相同则向前移动3个字符.
incremental_start:尝试查找搜索字符串,如果找到的位置不是3的倍数,它会尝试在找到的位置后找到下一个位置.

请注意:示例数据只是为了确保每个函数都必须通过完整的字符串,性能与实际数据或随机数据类似.

结果

Python 2.7:在Windows 10上sliding_window使用incremental_startPython2.7中的函数可以将初始函数提高约39倍.Ubuntu的性能改进略有下降,~34x,~37x,~18x(VM,AWS ,原生的),但仍然在相同的范围内.
Python 3.4:sliding_window变得比Python2.7慢(Windows上为1.8x,所有Ubuntus上为1.4x,1.5x),但所有Ubuntus的incremental_start性能下降了4,5,1.7(VM,AWS,native)虽然在Windows上几乎没有改变.
Windows vs Ubuntu
Python2.7:虚拟化的Ubuntus需要更少的时间用于这两个功能(~20-30%),原生Ubuntu的速度慢了约25%incremental_start,而速度提高了sliding_window40%.
Python3:该sliding_window函数需要更少的时间来完成(~50%),而incremental_start变得慢了〜2-3倍.

问题

是什么导致Linux 2与Windows上的Python 2与Python 3的性能差异？
如何预测这种行为并调整代码以获得最佳性能？

码

import timeit

text = 'ATG' * 10**6
word = 'TAG'

def sliding_window(text, word):
    for pos in range(0, len(text), 3):
        if text[pos:pos + 3] == word:
            return False
    return True

def incremental_start(text, word):
    start = 0
    while start != -1:
        start = text.find(word, start + 1)
        if start % 3 == 0:
            return False
    return True

#sliding window
time = timeit.Timer(lambda: sliding_window(text, word), setup='from __main__ import text, word').timeit(number=10)
print('%3.3f' % time)

#incremental start
time = timeit.Timer(lambda: incremental_start(text, word), setup='from __main__ import text, word').timeit(number=500)
print('%3.3f' % time)

Run Code Online (Sandbox Code Playgroud)

表

Ubuntu vs Windows    VM     AWS    Native   
Python2.7-Increment  79%    73%    126% 
Python2.7-Sliding    70%    70%    60%                  
Python3.4-Increment  307%   346%   201% 
Python3.4-Sliding    54%    59%    48%  

Py2 vs 3    Windows    VM    AWS    Native
Increment   105%       409%  501%   168%
Sliding     184%       143%  155%   147%

Absolute times in seconds
                 Win10   Ubuntu  AWS     Native
Py2.7-Increment  1.759   1.391   1.279   2.215 
Py2.7-Sliding    1.361   0.955   0.958   0.823 

Py3.4-Increment  1.853   5.692   6.406   3.722 
Py3.4-Sliding    2.507   1.365   1.482   1.214

Run Code Online (Sandbox Code Playgroud)

详细信息
Windows 10:原生Windows,32位Python 3.4.3或2.7.9,i5-2500,16GB RAM
Ubuntu虚拟机:14.04,在Windows主机上运行,64位Python 3.4.3,Python 2.7.6,4核,4GB RAM
AWS:14.04,AWS微型实例,64位Python 3.4.3,Python 2.7.6本
机Ubuntu:14.04,64位Python 3.4.3,Python 2.7.6,i5-2500,16GB RAM [与Win10机器相同]

更新

作为建议的Ingaz xrange和bytes使用性能,略有改善,但仍处于性能大幅度下降在Ubuntu与Python3.4.find当Ubuntu和Py3.4合并时,罪魁祸首似乎要慢得多(与编译来自Py3.5的源码相同).这似乎与Linux风味有关,在Debian Py2.7和Py3.4上表现相同,在RedHat Py2.7上比Py3.4要快得多.
为了更好地比较,Py3.4现在用于Windows10和Ubuntu上的64位.Py27仍在Win10上使用.

import timeit, sys

if sys.version_info >= (3,0):
    from builtins import range as xrange

def sliding_window(text, word):
    for pos in range(0, len(text), 3):
        if text[pos:pos + 3] == word:
            return False
    return True

def xsliding_window(text, word):
    for pos in xrange(0, len(text), 3):
        if text[pos:pos + 3] == word:
            return False
    return True

def incremental_start(text, word):
    start = 0
    while start != -1:
        start = text.find(word, start + 1)
        if start % 3 == 0:
            return False
    return True

text = 'aaa' * 10**6
word = 'aaA'
byte_text = b'aaa' * 10**6
byte_word = b'aaA'

time = timeit.Timer(lambda: sliding_window(text, word), setup='from __main__ import text, word').timeit(number=10)
print('Sliding, regular:      %3.3f' % time)

time = timeit.Timer(lambda: incremental_start(text, word), setup='from __main__ import text, word').timeit(number=500)
print('Incremental, regular:  %3.3f' % time)

time = timeit.Timer(lambda: sliding_window(byte_text, byte_word), setup='from __main__ import byte_text, byte_word').timeit(number=10)
print('Sliding, byte string:  %3.3f' % time)

time = timeit.Timer(lambda: incremental_start(byte_text, byte_word), setup='from __main__ import byte_text, byte_word').timeit(number=500)
print('Incremental, bytes:    %3.3f' % time)

time = timeit.Timer(lambda: xsliding_window(byte_text, byte_word), setup='from __main__ import byte_text, byte_word').timeit(number=10)
print('Sliding, xrange&bytes: %3.3f' % time)

time = timeit.Timer(lambda: text.find(word), setup='from __main__ import text, word').timeit(number=1000)
print('simple find in string: %3.3f' % time)


Win10-py27  Wi10-py35   VM-py27  VM-py34
1.440       2.674       0.993    1.368 
1.864       1.425       1.436    5.711 
1.439       2.388       1.048    1.219 
1.887       1.405       1.429    5.750 
1.332       2.356       0.772    1.224 
3.756       2.811       2.818    11.361

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ale*_* Yu 5

虽然您正在测量相同代码的速度,但代码中的结构是不同的.

A. range在2.7中type 'list',范围在3.4中class 'range'

B.'ATG'*10**6 in 2.7是一个字节字符串,3.4是和unicode字符串

如果出现以下情况,您可以尝试生成更多兼容的结果:a)使用xrange for 2.7 variant,b)bytes在两个示例中使用string:b'ATG'或两个示例中的unicode字符串.

更新

我怀疑性能上的差异源于主要因素:a)32位对64位,b)C编译器.

所以,我做了以下测试:

ActiveState Python 2.7.10 32位
ActiveState Python 2.7.10 64位
官方发行版Python 2.7.11 32bit
官方发行版Python 2.7.11 64bit
Windows 10上的Ubuntu上的 Python 2.7.6 64位
pypy-5.1.1-Win32的

我的期望

我期望:

64位版本会慢一些
ActiveState会快一点
PyPy的速度更快
Windows 10上的Ubuntu - ???

结果

Test                    as32b   as64b   off32b   off64b  ubw64b  pypy5.1.1
Sliding, regular:       1.232   1.230   1.281    1.136   0.951   0.099  
Incremental, regular:   1.744   1.690   2.219    1.647   1.472   2.772
Sliding, byte string:   1.223   1.207   1.280    1.127   0.926   0.101
Incremental, bytes:     1.720   1.701   2.206    1.646   1.568   2.774
Sliding, xrange&bytes:  1.117   1.102   1.162    0.962   0.779   0.109
simple find in string:  3.443   3.412   4.607    3.300   2.487   0.289

Run Code Online (Sandbox Code Playgroud)

Windows 10上的胜利者是......用GCC 4.8.2 for Linux编译的Ubuntu Python!

这个结果对我来说完全出乎意料.

32比64:变得无关紧要.

PyPy:一如既往的megafast,除非它不是.

我无法解释这个结果,OP问题变得不那么简单.

归档时间：	9 年，4 月前
查看次数：	1196 次
最近记录：	8 年，1 月前