如果我有一个字符串
"this is a string"
Run Code Online (Sandbox Code Playgroud)
如何缩短它以使我在单词之间只有一个空格而不是多个空格?(空格的数量是随机的)
"this is a string"
Run Code Online (Sandbox Code Playgroud)
Nic*_*tin 13
您可以使用string.split并" ".join(list)以合理的pythonic方式实现这一点 - 可能有更高效的算法,但它们看起来不太好.
顺便说一句,这比使用正则表达式快得多,至少在示例字符串上:
import re
import timeit
s = "this is a string"
def do_regex():
for x in xrange(100000):
a = re.sub(r'\s+', ' ', s)
def do_join():
for x in xrange(100000):
a = " ".join(s.split())
if __name__ == '__main__':
t1 = timeit.Timer(do_regex).timeit(number=5)
print "Regex: ", t1
t2 = timeit.Timer(do_join).timeit(number=5)
print "Join: ", t2
$ python revsjoin.py
Regex: 2.70868492126
Join: 0.333452224731
Run Code Online (Sandbox Code Playgroud)
编译这个正则表达式确实提高了性能,但是只有在调用sub已编译的正则表达式时,而不是将编译后的表单re.sub作为参数传递:
def do_regex_compile():
pattern = re.compile(r'\s+')
for x in xrange(100000):
# Don't do this
# a = re.sub(pattern, ' ', s)
a = pattern.sub(' ', s)
$ python revsjoin.py
Regex: 2.72924399376
Compiled Regex: 1.5852200985
Join: 0.33763718605
Run Code Online (Sandbox Code Playgroud)
re.sub(r'\s+', ' ', 'this is a string')
Run Code Online (Sandbox Code Playgroud)
您可以预编译并存储它,以获得更好的性能:
MULT_SPACES = re.compile(r'\s+')
MULT_SPACES.sub(' ', 'this is a string')
Run Code Online (Sandbox Code Playgroud)