Python re.split()vs split()

Question

Python re.split()vs split()

在我的优化任务中,我发现内置的split()方法比re.split()等效快了大约40%.

虚拟基准(易于复制 - 粘贴):

import re, time, random 

def random_string(_len):
    letters = "ABC"
    return "".join([letters[random.randint(0,len(letters)-1)] for i in range(_len) ])

r = random_string(2000000)
pattern = re.compile(r"A")

start = time.time()
pattern.split(r)
print "with re.split : ", time.time() - start

start = time.time()
r.split("A")
print "with built-in split : ", time.time() - start

Run Code Online (Sandbox Code Playgroud)

为何如此区别？

Answer 1

Nul*_*ion 20

re.split由于正则表达式的使用会产生一些开销,因此预计会变慢.

当然,如果你在一个常量字符串上拆分,那么使用就没有意义了re.split().

Answer 2

the*_*olf 9

如有疑问,请检查源代码.您可以看到Python s.split()针对空白和内联进行了优化.但s.split()仅适用于固定分隔符.

对于速度权衡,基于re.split正则表达式的拆分更加灵活.

>>> re.split(':+',"One:two::t h r e e:::fourth field")
['One', 'two', 't h r e e', 'fourth field']
>>> "One:two::t h r e e:::fourth field".split(':')
['One', 'two', '', 't h r e e', '', '', 'fourth field']
# would require an addition step to find the empty fields...
>>> re.split('[:\d]+',"One:two:2:t h r e e:3::fourth field")
['One', 'two', 't h r e e', 'fourth field']
# try that without a regex split in an understandable way...

Run Code Online (Sandbox Code Playgroud)

这re.split()是只有慢29%(或者说s.split()只有40%的速度)是什么应该是惊人的.

归档时间：	14 年，5 月前
查看次数：	36857 次
最近记录：	8 年，1 月前