Python str.translate VS str.replace

Nar*_*asK 8 python

为什么在Python replace中比translate?快1.5倍?

In [188]: s = '1 a  2'

In [189]: s.replace(' ','')
Out[189]: '1a2'

In [190]: s.translate(None,' ')
Out[190]: '1a2'

In [191]: %timeit s.replace(' ','')
1000000 loops, best of 3: 399 ns per loop

In [192]: %timeit s.translate(None,' ')
1000000 loops, best of 3: 614 ns per loop
Run Code Online (Sandbox Code Playgroud)

小智 15

假设Python 2.7版(因为我要掷硬币而不会被说),我们可以找到源代码string.translate与string.replacestring.py:

>>> import inspect
>>> import string
>>> inspect.getsourcefile(string.translate)
'/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/string.py'
>>> inspect.getsourcefile(string.replace)
'/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/string.py'
>>>
Run Code Online (Sandbox Code Playgroud)

哦,我们不能,as string.py开头:

"""A collection of string operations (most are no longer used).

Warning: most of the code you see here isn't normally used nowadays.
Beginning with Python 1.6, many of these functions are implemented as
methods on the standard string object.
Run Code Online (Sandbox Code Playgroud)

我赞成你是因为你开始了分析的路径,所以让我们继续沿着那个线程:

from cProfile import run
from string import ascii_letters

s = '1 a  2'

def _replace():
    for x in range(5000000):
        s.replace(' ', '')

def _translate():
    for x in range(5000000):    
        s.translate(None, ' ')
Run Code Online (Sandbox Code Playgroud)

替换:

run("_replace()")
         5000004 function calls in 2.059 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.976    0.976    2.059    2.059 <ipython-input-3-9253b3223cde>:8(_replace)
        1    0.000    0.000    2.059    2.059 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  5000000    1.033    0.000    1.033    0.000 {method 'replace' of 'str' objects}
        1    0.050    0.050    0.050    0.050 {range}
Run Code Online (Sandbox Code Playgroud)

和翻译:

run("_translate()")

         5000004 function calls in 1.785 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.977    0.977    1.785    1.785 <ipython-input-3-9253b3223cde>:12(_translate)
        1    0.000    0.000    1.785    1.785 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  5000000    0.756    0.000    0.756    0.000 {method 'translate' of 'str' objects}
        1    0.052    0.052    0.052    0.052 {range}
Run Code Online (Sandbox Code Playgroud)

我们的函数调用数是相同的,而不是更多的函数调用意味着运行速度会慢,但它通常是一个好看的地方.有趣的是,translate在我的机器上运行速度比replace!考虑到测试并不孤立变化的乐趣-这不是问题,因为我们只关心的是能够告诉为什么出现差异.

在任何情况下,我们至少现在知道可能存在性能差异,并且在评估字符串对象的方法时确实存在(参见参考资料tottime).该translate __docstring__建议有比赛中的一个转换表,而只能更换提到旧到新新子更换.

让我们转向我们的老伙伴dis提示:

from dis import dis
Run Code Online (Sandbox Code Playgroud)

更换:

def dis_replace():
    '1 a  2'.replace(' ', '')

dis(dis_replace)


dis("'1 a  2'.replace(' ', '')")

  3           0 LOAD_CONST               1 ('1 a  2')
              3 LOAD_ATTR                0 (replace)
              6 LOAD_CONST               2 (' ')
              9 LOAD_CONST               3 ('')
             12 CALL_FUNCTION            2
             15 POP_TOP             
             16 LOAD_CONST               0 (None)
             19 RETURN_VALUE        
Run Code Online (Sandbox Code Playgroud)

而且translate,对我来说跑得更快:

def dis_translate():
    '1 a  2'.translate(None, ' ')
dis(dis_translate)    


  2           0 LOAD_CONST               1 ('1 a  2')
              3 LOAD_ATTR                0 (translate)
              6 LOAD_CONST               0 (None)
              9 LOAD_CONST               2 (' ')
             12 CALL_FUNCTION            2
             15 POP_TOP             
             16 LOAD_CONST               0 (None)
             19 RETURN_VALUE        
Run Code Online (Sandbox Code Playgroud)

不幸的是,这两个看起来完全相同dis,这意味着我们应该开始在这里查看字符串的C源代码(通过转到我正在使用的Python版本的python源代码找到)](https:// hg. python.org/cpython/file/a887ce8611d2/Objects/stringobject.c).

这是翻译来源.
如果您查看注释,则可以看到replace根据输入的长度有多个函数定义行.

我们的子串替换选项是:

replace_substring_in_place

/* len(self)>=1, len(from)==len(to)>=2, maxcount>=1 */
Py_LOCAL(PyStringObject *)
replace_substring_in_place(PyStringObject *self,
Run Code Online (Sandbox Code Playgroud)

replace_substring:

/* len(self)>=1, len(from)>=2, len(to)>=2, maxcount>=1 */
Py_LOCAL(PyStringObject *)
replace_substring(PyStringObject *self,
Run Code Online (Sandbox Code Playgroud)

replace_delete_single_character:

/* Special case for deleting a single character */
/* len(self)>=1, len(from)==1, to="", maxcount>=1 */
Py_LOCAL(PyStringObject *)
replace_delete_single_character(PyStringObject *self,
                                char from_c, Py_ssize_t maxcount)
Run Code Online (Sandbox Code Playgroud)

'1 a 2'.replace(' ', '')是一个len(self)== 6,用一个空字符串替换1个char,使其成为a replace_delete_single_character.

您可以自己查看函数体,但答案是"C函数体运行速度replace_delete_single_characterstring_translate此特定输入快.

谢谢你提出这个问题.

  • @NarūnasK 请重新阅读我的答案。我错误地使用了“dis”并清理了答案以使其更加连贯。 (2认同)

Jor*_*ley 5

当N和M增加时,translate可能会更快,其中N是唯一字符替换映射的数量,M是正在翻译的字符串的长度.

import random
import string
import timeit
import re

def do_translation(N,M):
    trans_map = random.sample(string.ascii_lowercase,N),random.sample(string.ascii_lowercase,N)
    trans_tab = string.maketrans(*map("".join,trans_map))
    s = "".join(random.choice(string.ascii_lowercase) for _ in range(M))
    return s.translate(trans_tab)

def do_resub(N,M):
    trans_map = random.sample(string.ascii_lowercase,N),random.sample(string.ascii_lowercase,N)
    trans_tab = dict(zip(*trans_map))
    s = "".join(random.choice(string.ascii_lowercase) for _ in range(M))
    return re.sub("([%s])"%("".join(trans_map[0]),),lambda m:trans_tab.get(m.group(0),m.group(0)),s)

def do_replace(N,M):
    trans_map = random.sample(string.ascii_lowercase,N),random.sample(string.ascii_lowercase,N)
    s = "".join(random.choice(string.ascii_lowercase) for _ in range(M))
    for k,v in zip(*trans_map):
       s = s.replace(k,v)
    return s


data = {}
for i in range(2,20,2):
    for j in range(10,200,10):
        data[(i,j)] = {
            "translate":timeit.timeit("do_translation(%s,%s)"%(i,j),"from __main__ import do_translation,string,random",number=100),
            "re.sub":timeit.timeit("do_resub(%s,%s)"%(i,j),"from __main__ import do_resub,re,random",number=100),
            "replace":timeit.timeit("do_replace(%s,%s)"%(i,j),"from __main__ import do_replace,random",number=100)}

print data
Run Code Online (Sandbox Code Playgroud)

将向您展示几个不同的时间...包括在这些情况下翻译可以更快(我考虑在这里添加一些情节......但我已经在这个问题上投入了比我真正应该有的更多时间:P)