查找两个列表中不包含公共字符的所有字符串对

Cam*_*ler 5 python string algorithm comparison

我有两个字符串列表,希望在它们之间找到所有不包含公共字符的字符串对。例如

list1 = ['abc', 'cde']
list2 = ['aij', 'xyz', 'abc']

desired output = [('abc', 'xyz'), ('cde', 'aij'), ('cde', 'xyz')]
Run Code Online (Sandbox Code Playgroud)

我需要尽可能高效,因为我正在处理包含数百万个字符串的列表。目前,我的代码遵循以下一般模式:

list1 = ['abc', 'cde']
list2 = ['aij', 'xyz', 'abc']

desired output = [('abc', 'xyz'), ('cde', 'aij'), ('cde', 'xyz')]
Run Code Online (Sandbox Code Playgroud)

这是 O(n^2) 并且需要很多小时才能运行,有人对如何加快速度有一些建议吗?也许有一种方法可以利用正在排序的每个字符串中的字符?

非常感谢您提前!

Dav*_*tat 7

这是另一种策略,专注于将集合操作降低到位操作和组合表示同一组字母的单词:

import collections
import string


def build_index(words):
    index = collections.defaultdict(list)
    for word in words:
        chi = sum(1 << string.ascii_lowercase.index(letter) for letter in set(word))
        index[chi].append(word)
    return index


def disjoint_pairs(words1, words2):
    index1 = build_index(words1)
    index2 = build_index(words2)
    for chi1, words1 in index1.items():
        for chi2, words2 in index2.items():
            if chi1 & chi2:
                continue
            for word1 in words1:
                for word2 in words2:
                    yield word1, word2


print(list(disjoint_pairs(["abc", "cde"], ["aij", "xyz", "abc"])))
Run Code Online (Sandbox Code Playgroud)