Cam*_*ler 5 python string algorithm comparison
我有两个字符串列表,希望在它们之间找到所有不包含公共字符的字符串对。例如
list1 = ['abc', 'cde']
list2 = ['aij', 'xyz', 'abc']
desired output = [('abc', 'xyz'), ('cde', 'aij'), ('cde', 'xyz')]
Run Code Online (Sandbox Code Playgroud)
我需要尽可能高效,因为我正在处理包含数百万个字符串的列表。目前,我的代码遵循以下一般模式:
list1 = ['abc', 'cde']
list2 = ['aij', 'xyz', 'abc']
desired output = [('abc', 'xyz'), ('cde', 'aij'), ('cde', 'xyz')]
Run Code Online (Sandbox Code Playgroud)
这是 O(n^2) 并且需要很多小时才能运行,有人对如何加快速度有一些建议吗?也许有一种方法可以利用正在排序的每个字符串中的字符?
非常感谢您提前!
这是另一种策略,专注于将集合操作降低到位操作和组合表示同一组字母的单词:
import collections
import string
def build_index(words):
index = collections.defaultdict(list)
for word in words:
chi = sum(1 << string.ascii_lowercase.index(letter) for letter in set(word))
index[chi].append(word)
return index
def disjoint_pairs(words1, words2):
index1 = build_index(words1)
index2 = build_index(words2)
for chi1, words1 in index1.items():
for chi2, words2 in index2.items():
if chi1 & chi2:
continue
for word1 in words1:
for word2 in words2:
yield word1, word2
print(list(disjoint_pairs(["abc", "cde"], ["aij", "xyz", "abc"])))
Run Code Online (Sandbox Code Playgroud)