你可以用给定单词的字母制作4个字母或更多的常用英文单词(每个字母只能使用一次)

Question

你可以用给定单词的字母制作4个字母或更多的常用英文单词(每个字母只能使用一次)

Bio*_*eek 7 python puzzle algorithm permutation

在块日历的背面,我发现了以下谜语:

你可以用"教科书"这个词的字母做出多少4个字母或更多的常用英文单词(每个字母只能使用一次).

我想出的第一个解决方案是:

from itertools import permutations

with open('/usr/share/dict/words') as f:
    words = f.readlines()

words = map(lambda x: x.strip(), words)

given_word = 'textbook'

found_words = []

ps = (permutations(given_word, i) for i in range(4, len(given_word)+1))

for p in ps:
    for word in map(''.join, p):
        if word in words and word != given_word:
            found_words.append(word)
print set(found_words)

Run Code Online (Sandbox Code Playgroud)

这给出了结果,set(['tote', 'oboe', 'text', 'boot', 'took', 'toot', 'book', 'toke', 'betook'])但在我的机器上花费了超过7分钟.

我的下一次迭代是:

with open('/usr/share/dict/words') as f:
    words = f.readlines()

words = map(lambda x: x.strip(), words)

given_word = 'textbook'

print [word for word in words if len(word) >= 4 and sorted(filter(lambda letter: letter in word, given_word)) == sorted(word) and word != given_word]

Run Code Online (Sandbox Code Playgroud)

几乎立即返回答案,但给出了答案: ['book', 'oboe', 'text', 'toot']

什么是这个问题的最快,最正确和最pythonic的解决方案？

(编辑:添加了我之前的排列解决方案及其不同的输出).

Answer 1

Voo*_*Voo 3

我想我应该分享这个稍微有趣的技巧，尽管它比其他技巧需要更多的代码，而且并不是真正的“Pythonic”。这将比其他解决方案需要更多的代码，但如果我看看其他解决方案所需的时间，应该会相当快。

我们正在进行一些预处理以加快计算速度。基本方法如下：我们为字母表中的每个字母分配一个素数。例如，A = 2，B = 3，依此类推。然后，我们为字母表中的每个单词计算一个哈希值，它只是单词中每个字符的素数表示的乘积。然后，我们将每个单词存储在由哈希索引的字典中。

现在，如果我们想找出哪些单词相当于 say，textbook我们只需计算该单词的相同哈希值并在字典中查找它即可。通常（比如在 C++ 中）我们必须担心溢出，但在 python 中，情况比这更简单：列表中具有相同索引的每个单词将包含完全相同的字符。

这是经过稍微优化的代码，在我们的例子中，我们只需要担心给定单词中也出现的字符，这意味着我们可以使用比其他方式小得多的素数表（明显的优化是只分配那些字符）在单词中出现一个值 - 无论如何它足够快所以我没有打扰，这样我们就可以只预处理一次并为几个单词做它）。素数算法经常很有用，所以无论如何你应该自己拥有一个；）

from collections import defaultdict
from itertools import permutations

PRIMES = list(gen_primes(256)) # some arbitrary prime generator

def get_dict(path):
    res = defaultdict(list)
    with open(path, "r") as file:
        for line in file.readlines():
            word = line.strip().upper()
            hash = compute_hash(word)
            res[hash].append(word)
    return res

def compute_hash(word):
    hash = 1
    for char in word:
        try:
            hash *= PRIMES[ord(char) - ord(' ')]
        except IndexError:
            # contains some character out of range - always 0 for our purposes
            return 0
    return hash

def get_result(path, given_word):
    words = get_dict(path)
    given_word = given_word.upper()
    result = set()
    powerset = lambda x: powerset(x[1:]) + [x[:1] + y for y in powerset(x[1:])] if x else [x]
    for word in (word for word in powerset(given_word) if len(word) >= 4):
        hash = compute_hash(word)
        for equiv in words[hash]:
            result.add(equiv)
    return result

if __name__ == '__main__':
    path = "dict.txt"
    given_word = "textbook"
    result = get_result(path, given_word)
    print(result)

Run Code Online (Sandbox Code Playgroud)

在我的 ubuntu 单词列表（98k 单词）上运行得相当快，但不是我所说的 pythonic，因为它基本上是 C++ 算法的端口。如果您想以这种方式比较多个单词，这很有用。

归档时间：	14 年，4 月前
查看次数：	2213 次
最近记录：	14 年，4 月前