摘自编程珍珠第15.2节
可在此处查看C代码:http://www.cs.bell-labs.com/cm/cs/pearls/longdup.c
当我使用suffix-array在Python中实现它时:
example = open("iliad10.txt").read()
def comlen(p, q):
i = 0
for x in zip(p, q):
if x[0] == x[1]:
i += 1
else:
break
return i
suffix_list = []
example_len = len(example)
idx = list(range(example_len))
idx.sort(cmp = lambda a, b: cmp(example[a:], example[b:])) #VERY VERY SLOW
max_len = -1
for i in range(example_len - 1):
this_len = comlen(example[idx[i]:], example[idx[i+1]:])
print this_len
if this_len > max_len:
max_len = this_len
maxi = i
Run Code Online (Sandbox Code Playgroud)
我发现这idx.sort一步很慢.我认为它很慢,因为Python需要通过值而不是指针传递子串(如上面的C代码).
测试文件可以从这里 …