act*_*ata 2 python algorithm performance time-complexity
我目前正在leetcode.com上做一些编码问题,并且由于输出相同结果的两个算法之间的计算时间差异而难以接受.
问题:给定一个字符串数组字,找到length(word [i])*length(word [j])的最大值,其中两个字不共享公共字母.您可以假设每个单词仅包含小写字母.如果不存在这两个单词,则返回0.
在最低限度,人们需要比较字符串对的每个组合,以便计算答案.我觉得bitwise更快/更有效率,但我选择忽略它为了这个问题.
您可以在本文的底部找到我正在使用的测试用例单词列表.
算法1:
curr_max=0
for i in range(0,len(words)):
for j in range(0,len(words)):
if i<j:
curr_word=set(words[i])
other_word=words[j]
for char in curr_word:
if char in other_word:
break
else:
curr_max=max(curr_max,len(words[i])*len(other_word))
print(curr_max)
Run Code Online (Sandbox Code Playgroud)
这个算法本质上是一个嵌套的for循环,它比较了每个可能的单词组合,如果不是"if i",它将包括双重计数和自身计数.
算法2:
curr_max = 0
while words:
curr_word = set(words[0])
curr_len = len(words[0])
words = words[1:]
for word in words:
for char in curr_word:
if char in word:
break
else:
curr_max = max(curr_max, curr_len*len(word))
print(curr_max)
Run Code Online (Sandbox Code Playgroud)
该算法基本上将列表中的第一个单词与所有其他单词进行比较,然后将其从列表中删除并重复该过程.它大约需要0.02秒,大约快4倍.我很难理解为什么差异如此之大.在此先感谢您的帮助!
我用来比较时间的输入是这样的:
words=["eedfdddadbcc","acdadecebafaebaec","dfde","ececbefe","bbafebbeccbddddbd","eafeffddbbbf","cbd","abadeaddfbcfbadb","ffdcacaebbeaa","fbadcfeede","bdcefdbfec","bbadfccdfebefd","dbefdfabededb","cbbccdecfbbe","abaeeecdbecebafedbfee","fbefbdfc","fffafb","bfadbdefbfedbddff","cbccbfdadbfe","bacaafecbbfaae","fcbffdbefcfbccd","fefaadfaaafdfbdaff","ecabaff","ccbdefcdcfac","bbbfafbffcefbc","edecefa","bcdfbcebabae","aaefecfcbbccfaeeaf","beaabaaeaebef","adfac","acedfdabccebc","efbfbef","bfccadbcbcfcdabfa","ffcaddbcf","dfae","ccadeeebeaabddebcadec","babaa","ebbdbaabddfdddad","fafaddbaebdaa","eeeeddadedfe","effbca","abcddfa","cbadcfeffeaaeecbbfe","ceaabcfaaaefeeadf","acecadddde","ece","dc","bfafdefbbdafacdcfb","fbdcad","dbaaffcdbcbdea","baaee","bebed","beaedceceaa","eacbcfdbcefbaddffcac","acddaedacfeedffad","efebff","efcbf","cdfffffaacfacafb","adacaceea","fceeffededbcfbfaaf","eafaeffcbfde","debadcddbdbabefdbe","ef","eeeeabfbaabddaecb","eeadcdcdaacaabe","ebcbffdefafdcbcebec","eb","adedefbaabfcefbea","ddceabfddaaefcea","ddffb","fdadfac","de","cbcdcbddcdabeb","ccffeeafbbbf","ccba","dab","bbdbeefdbef","cbec","ffcbefdbfdacdbdbf","adfad","ceacdcbfbdbbaebbd","ecfeaefff","ddbbdaefddeebd","eeeee","abdadc","eafecdbdef","aeedaeeaebaaeecdd","dddeebcbdea","bcaadedacb","ebdeadddcafa","ecbdbcbfccbdffaef","fddcffbfffa","accbdcfcedeabeab","cfbbefbddcaeecfbfacc","efffdaacbafeecdad","aaadfa","efeccbabdefaf","defebaddaafdcd","ecebcaacdaccaddcfeee","fdbbfecaffeafaa","bafccdbea","caa","deedefdeccead","bbfbfeaeddacfacea","daaefbbcbcdbfbfdda","aceed","cfeadadadbcff","eaefcdca","cefebdbaafeabdbdeaafd","abec","aeececad","cfeabcbaeaebdbcaada","aac","ebabffeb","fa","cf","dcebedefc","dbaedceecf","ffebaedafccceb","faefbeacaddefbe","eeadbfabfbbbfaeffaeea","affdecaca","ccfdcbdefcdfaddbbeaed","bc","feafaaabaceade","bebdfbbad","eeacaefaddacac","fff","aeddcd","ccffbabbdfc","ecddbcdeecdfbbb","debdbcdcafdcd","cfaebaeddbbd","efdada","becdccaeffeedcadbdd","feedbacc","cbbeebcdad","bfdcbdfdbcceadded","cfdbfdddafadadddcba","bcedeaeeaac","ffdcfccffdfffaebf","afffceaaadbbedfdd","faaeebdfbfefddebed","eedafbddeeaaadcdeaccc","eeceadafa","ebcfaccabea","eebdbbedcaedcbdcfaba","ecfcadaebacbdfdccebe","cbbabdadaee","cfeea","dec","cfedcbaabbaef","aacdabcbf","dfdbacadbebeedcd","bccccfdcdfe","cfcacdbcdccddcadce","dafafeccfaccaadeabbf","eaffaaffefccde","bbfbddccfda","fbdbbbbfbe","eafbcafbdbead","edbcdcefdc","fe","aafdcabce","ddafedceddcdcbfbcafe","dabcafbcfafeeadbbbef","beeaacd","cadeabebdbcfbbdfe","ecfefbfbbfa","fedacafcc","bcdcefecbcebaeeccdbd","fefde","cafba","bdabeaabbdbbbccecebda","dfeaadbeaaeefdfbed","dbaecde","cfbdfffbdeeeeb","fc","decadcacfaabca","cebbdff","badabbddcfed","fcce","fedfadefcf","acfccfbfcda","debfc","bebafeaeffe","ceaefbbcefacbbacb","cebbaeb","cadedfdafecdfb","bfefdfbaceddfcbade","cefeefaeddafbbdcade","faceadcefbffadb","cfbacafae","dfbfadfdccedbcbeaae","dbbccdddaf","ebbcbcebdddcedcfdcfaa","ccedffbcdbaedfaeb","ccfeaceaaaaeee","faade","afaaacaecbffdbadcbcd","cebfbbefbbdabbbffea","cdaadba","bbefdcacaaadbbbdedec","adabfbebdb","fcfefadcbadaacbdcfdbb","adddadebfc","fb","ecfebaacbdabece","dabacfdecfe","eeeecc","eabbe","fcdffababd","aafdbbcfdecbccca","efebaaadfecccecaa","cffefdbf","bcbdd","eaaccdcfdbbbcf"]
Run Code Online (Sandbox Code Playgroud)
更新OP以帮助人们帮助我解决这个问题! 我现在用这个生成单词:
words = [''.join([choice('abcdefghijklmnopqrstuvwxyz') for _ in range(randrange(2, 22))]) for _ in range(250)]
Run Code Online (Sandbox Code Playgroud)
我正在执行第一个算法:
t1=time.time()
curr_max=0
for i in range(0,len(words)):
for j in range(0,i):
curr_word=set(words[i])
other_word=words[j]
for char in curr_word:
if char in other_word:
break
else:
curr_max=max(curr_max,len(words[i])*len(other_word))
print(curr_max)
t0=time.time()
print(t0-t1)
Run Code Online (Sandbox Code Playgroud)
我看到的结果是在0.1秒范围内.
我使用的第二个算法:
t1=time.time()
curr_max = 0
while words:
curr_word = set(words[0])
curr_len = len(words[0])
words = words[1:]
for word in words:
for char in curr_word:
if char in word:
break
else:
curr_max = max(curr_max, curr_len*len(word))
print(curr_max)
t0=time.time()
print(t0-t1)
Run Code Online (Sandbox Code Playgroud)
我看到0.04-0.05秒范围内的结果.任何人都可以复制这个吗?
两种算法看起来都像是相同的工作量.两者都重新创建了这个itertools.combinations()功能.但是,第一种方法更频繁地重新创建集合,并执行额外的n**2个i < j测试(对于n个单词)!
你正在创建超过2种组合的len(n),所以n!/(n - 2)!(见维基百科),这比n**2要少得多:
>>> import math
>>> n = 250
>>> math.factorial(n) / (math.factorial(2) * math.factorial(n - 2))
31125.0
>>> n ** 2
62500
>>> n ** 2 - (math.factorial(n) / (math.factorial(2) * math.factorial(n - 2)))
31375.0
Run Code Online (Sandbox Code Playgroud)
因此,对于您的特定情况,算法#1执行的循环次数是算法#2的两倍多.随着n的增加,产品除以组合的数量接近2,所以它总是至少做两倍的工作.
接下来,您words[0]只在算法#2中创建一组,但是您为算法#1的每个内循环执行此操作:
# algorithm #1
for ... # current word loop
for ... # other word loop
set(words[i])
# algorithm #2
while # current word loop
set(words[0])
for ... # other word loop
Run Code Online (Sandbox Code Playgroud)
正是这些差异导致它变慢; 创建(N超过2)集与仅N集可能会使您在这里的大部分性能成本.
为了进行适当的比较,你应该使用多次重复测试的timeit模块,确保使用最精确的时钟来测量花费的时间并禁用Python垃圾收集器(因此它不会干扰).我已经包含了一个随机单词列表,对于破坏性算法(你的#2),我不得不每次克隆列表,为此我通过减去相同数量的裸列表副本的时间进行补偿.
我跑的脚本:
from random import choice, randrange
from timeit import timeit
def naive_loop(words):
curr_max=0
for i in range(0,len(words)):
for j in range(0,len(words)):
if i<j:
curr_word=set(words[i])
other_word=words[j]
for char in curr_word:
if char in other_word:
break
else:
curr_max=max(curr_max,len(words[i])*len(other_word))
return curr_max
def destructive_loop(words):
curr_max = 0
while words:
curr_word = set(words[0])
curr_len = len(words[0])
words = words[1:]
for word in words:
for char in curr_word:
if char in word:
break
else:
curr_max = max(curr_max, curr_len*len(word))
return curr_max
def reduced_set_calls_loop(words):
curr_max=0
for i in range(0,len(words)):
curr_word=set(words[i])
for j in range(0,len(words)):
if i<j:
other_word=words[j]
for char in curr_word:
if char in other_word:
break
else:
curr_max=max(curr_max,len(words[i])*len(other_word))
return curr_max
words = [''.join([choice('abcdef') for _ in range(randrange(2, 22))]) for _ in range(250)]
number = 100
print('Naive:', timeit('naive_loop(words)', 'from __main__ import naive_loop, words', number=number))
print('Destructive:', # don't include time to copy a list
timeit('destructive_loop(words[:])', 'from __main__ import destructive_loop, words', number=number) -
timeit('words[:]', 'from __main__ import naive_loop, words', number=number))
print('Reduced set calls:', timeit('reduced_set_calls_loop(words)', 'from __main__ import reduced_set_calls_loop, words', number=number))
Run Code Online (Sandbox Code Playgroud)
结果:
Naive: 1.8516130640055053
Destructive: 0.3646556100138696
Reduced set calls: 0.5927464940032223
Run Code Online (Sandbox Code Playgroud)
因此,移动set()呼叫reduced_set_calls_loop()已经大大改善了第一个版本.通过更换减少循环的次数if i < j与for j in range(i):循环进一步降低了间隙:
>>> def reduced_iteration_loop(words):
... curr_max=0
... for i in range(0,len(words)):
... curr_word=set(words[i])
... for j in range(i):
... other_word=words[j]
... for char in curr_word:
... if char in other_word:
... break
... else:
... curr_max=max(curr_max,len(words[i])*len(other_word))
... return curr_max
...
>>> print('Reduced iteration:', timeit('reduced_iteration_loop(words)', 'from __main__ import reduced_iteration_loop, words', number=number))
Reduced iteration: 0.44450017900089733
Run Code Online (Sandbox Code Playgroud)
令我惊讶的是,你的破坏性循环比使用更快itertools.combinations():
>>> from itertools import combinations
>>> def destructive_loop_empty(words):
... while words:
... curr_word, words = words[0], words[1:]
... for word in words:
... pass
...
>>> def empty_combinations(words):
... for a, b in combinations(words, 2):
... pass
...
>>> timeit('destructive_loop_empty(words[:])', 'from __main__ import destructive_loop_empty, words', number=1000)
0.324253979997593
>>> timeit('empty_combinations(words[:])', 'from __main__ import empty_combinations, words', number=1000)
0.5626872480061138
Run Code Online (Sandbox Code Playgroud)
我们可以通过使用set disjunctions使您的算法#2更快,而不是单独测试每个字符.因为我们将重复测试单词,所以在字典中预先创建集合是有意义的,充当我们可以在测试时绘制的缓存.
最后,我们可以通过在字典中存储长度来创建非破坏性版本,并且只是循环遍历值(我们破坏字典):
def nondestructive_loop(words):
curr_max = 0
words = {w: (set(w), len(w)) for w in words}
while words:
curr_word, curr_word_length = words.popitem()[1]
for other, other_length in words.values():
if curr_word.isdisjoint(other):
curr_max = max(curr_max, curr_word_length * other_length)
return curr_max
Run Code Online (Sandbox Code Playgroud)
这是我能做到的最快的:
>>> print('Nondestructive:', timeit('nondestructive_loop(words)', 'from __main__ import nondestructive_loop, words', number=number))
Nondestructive: 0.2944725830020616
Run Code Online (Sandbox Code Playgroud)
另外削减20%.
因此,总而言之,直接在列表上进行迭代比从a生成索引range(),然后索引到列表中更快.差异足够大,值得你破坏列表(或字典)!
这也是itertools.combinations()变慢的原因; 它必须使用索引,因为它必须支持大于2的组合(这意味着你不能只从输入序列中删除).
| 归档时间: |
|
| 查看次数: |
113 次 |
| 最近记录: |