chr*_*ant 10 python string performance
def format_title(title):
''.join(map(lambda x: x if (x.isupper() or x.islower()) else '_', title.strip()))
Run Code Online (Sandbox Code Playgroud)
什么更快?
Joh*_*ooy 20
更快的方法是使用str.translate()
它比你的方式快50倍
# You only need to do this once
>>> title_trans=''.join(chr(c) if chr(c).isupper() or chr(c).islower() else '_' for c in range(256))
>>> "abcde1234!@%^".translate(title_trans)
'abcde________'
# Using map+lambda
$ python -m timeit '"".join(map(lambda x: x if (x.isupper() or x.islower()) else "_", "abcd1234!@#$".strip()))'
10000 loops, best of 3: 21.9 usec per loop
# Using str.translate
$ python -m timeit -s 'titletrans="".join(chr(c) if chr(c).isupper() or chr(c).islower() else "_" for c in range(256))' '"abcd1234!@#$".translate(titletrans)'
1000000 loops, best of 3: 0.422 usec per loop
# Here is regex for a comparison
$ python -m timeit -s 'import re;transre=re.compile("[\W\d]+")' 'transre.sub("_","abcd1234!@#$")'
100000 loops, best of 3: 3.17 usec per loop
Run Code Online (Sandbox Code Playgroud)
这是unicode的一个版本
# coding: UTF-8
def format_title_unicode_translate(title):
return title.translate(title_unicode_trans)
class TitleUnicodeTranslate(dict):
def __missing__(self,item):
uni = unichr(item)
res = u"_"
if uni.isupper() or uni.islower():
res = uni
self[item] = res
return res
title_unicode_trans=TitleUnicodeTranslate()
print format_title_unicode_translate(u"Metallica ?????????")
Run Code Online (Sandbox Code Playgroud)
请注意,希腊字母计为大写和小写,因此它们不会被替换.如果要替换它们,只需将条件更改为
if item<256 and (uni.isupper() or uni.islower()):
Run Code Online (Sandbox Code Playgroud)
Tim*_*ker 17
import re
title = re.sub("[\W\d]", "_", title.strip())
Run Code Online (Sandbox Code Playgroud)
应该更快.
如果要使用单个下划线替换一系列相邻的非字母,请使用
title = re.sub("[\W\d]+", "_", title.strip())
Run Code Online (Sandbox Code Playgroud)
相反,哪个更快.
我只是进行了时间比较:
C:\>python -m timeit -n 100 -s "data=open('test.txt').read().strip()" "''.join(map(lambda x: x if (x.isupper() or x.islower()) else '_', data))"
100 loops, best of 3: 4.51 msec per loop
C:\>python -m timeit -n 100 -s "import re; regex=re.compile('[\W\d]+'); data=open('test.txt').read().strip()" "title=regex.sub('_',data)"
100 loops, best of 3: 2.35 msec per loop
Run Code Online (Sandbox Code Playgroud)
这也适用于Unicode字符串(在Python 3下,\W匹配任何不是Unicode字符的字符.在Python 2下,你必须另外UNICODE为此设置标志).