Mer*_*emu 16 python algorithm permutation python-itertools multiset
我试图使用itertools.permutations()来返回字符串的所有排列,并仅返回作为一组单词成员的那些排列.
import itertools
def permutations_in_dict(string, words):
'''
Parameters
----------
string : {str}
words : {set}
Returns
-------
list : {list} of {str}
Example
-------
>>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'})
['act', 'cat']
'''
Run Code Online (Sandbox Code Playgroud)
我目前的解决方案在终端上运行良好,但不知何故无法通过测试用例...
return list(set([''.join(p) for p in itertools.permutations(string)]) & words)
Run Code Online (Sandbox Code Playgroud)
任何帮助将不胜感激.
Ray*_*ger 113
您正在解决的问题最好描述为测试anagram匹配.
在传统的解决方案是将目标字符串进行排序,排序候选字符串,并测试是否相等.
>>> def permutations_in_dict(string, words):
target = sorted(string)
return sorted(word for word in words if sorted(word) == target)
>>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'})
['act', 'cat']
Run Code Online (Sandbox Code Playgroud)
另一种方法是使用collections.Counter()进行多集相等测试.这是算法优于排序溶液(O(n)对O(n log n)),但容易丢失,除非字符串的尺寸较大(由于散列的所有字符的成本).
>>> def permutations_in_dict(string, words):
target = Counter(string)
return sorted(word for word in words if Counter(word) == target)
>>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'})
['act', 'cat']
Run Code Online (Sandbox Code Playgroud)
可以通过将对应于字符串中的每个可能字符的素数相乘来构造唯一的anagram签名或完美散列.
乘法的可交换属性保证哈希值对于单个字符串的任何排列都是不变的.哈希值的唯一性由算术的基本定理(也称为唯一素因子化定理)保证.
>>> from operator import mul
>>> primes = [2, 3, 5, 7, 11]
>>> primes += [p for p in range(13, 1620) if all(pow(b, p-1, p) == 1 for b in (5, 11))]
>>> anagram_hash = lambda s: reduce(mul, (primes[ord(c)] for c in s))
>>> def permutations_in_dict(string, words):
target = anagram_hash(string)
return sorted(word for word in words if anagram_hash(word) == target)
>>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'})
['act', 'cat']
Run Code Online (Sandbox Code Playgroud)
当字符串很小时(在n长度字符串上生成排列会生成n个阶乘候选者),使用itertools.permutations()通过目标字符串上的排列进行搜索是合理的.
好消息是,当n很小且单词数量很大时,这种方法运行得非常快(因为集合成员资格测试是O(1)):
>>> from itertools import permutations
>>> def permutations_in_dict(string, words):
perms = set(map(''.join, permutations(string)))
return sorted(word for word in words if word in perms)
>>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'})
['act', 'cat']
Run Code Online (Sandbox Code Playgroud)
正如OP推测的那样,使用set.intersection()可以将纯python搜索循环加速到c-speed :
>>> def permutations_in_dict(string, words):
perms = set(map(''.join, permutations(string)))
return sorted(words & perms)
>>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'})
['act', 'cat']
Run Code Online (Sandbox Code Playgroud)
哪种解决方案最好取决于字符串的长度和单词的长度.计时将显示哪个最适合特定问题.
以下是使用两种不同字符串大小的各种方法的一些比较时序:
Timings with string_size=5 and words_size=1000000
-------------------------------------------------
0.01406 match_sort
0.06827 match_multiset
0.02167 match_perfect_hash
0.00224 match_permutations
0.00013 match_permutations_set
Timings with string_size=20 and words_size=1000000
--------------------------------------------------
2.19771 match_sort
8.38644 match_multiset
4.22723 match_perfect_hash
<takes "forever"> match_permutations
<takes "forever"> match_permutations_set
Run Code Online (Sandbox Code Playgroud)
结果表明,对于小字符串,最快的方法是使用set-intersection搜索目标字符串上的排列.
对于较大的字符串,最快的方法是传统的排序和比较解决方案.
希望你发现这个小算法研究和我一样有趣.外卖是:
FWIW,这是我用来运行比较时间的测试设置:
from collections import Counter
from itertools import permutations
from string import letters
from random import choice
from operator import mul
from time import time
def match_sort(string, words):
target = sorted(string)
return sorted(word for word in words if sorted(word) == target)
def match_multiset(string, words):
target = Counter(string)
return sorted(word for word in words if Counter(word) == target)
primes = [2, 3, 5, 7, 11]
primes += [p for p in range(13, 1620) if all(pow(b, p-1, p) == 1 for b in (5, 11))]
anagram_hash = lambda s: reduce(mul, (primes[ord(c)] for c in s))
def match_perfect_hash(string, words):
target = anagram_hash(string)
return sorted(word for word in words if anagram_hash(word) == target)
def match_permutations(string, words):
perms = set(map(''.join, permutations(string)))
return sorted(word for word in words if word in perms)
def match_permutations_set(string, words):
perms = set(map(''.join, permutations(string)))
return sorted(words & perms)
string_size = 5
words_size = 1000000
population = letters[: string_size+2]
words = set()
for i in range(words_size):
word = ''.join([choice(population) for i in range(string_size)])
words.add(word)
string = word # Arbitrarily search use the last word as the target
print 'Timings with string_size=%d and words_size=%d' % (string_size, words_size)
for func in (match_sort, match_multiset, match_perfect_hash, match_permutations, match_permutations_set):
start = time()
func(string, words)
end = time()
print '%-10.5f %s' % (end - start, func.__name__)
Run Code Online (Sandbox Code Playgroud)
ACh*_*ion 12
你可以简单地用collections.Counter()比较words的string,而无需创建所有permutations(这与爆炸字符串的长度):
from collections import Counter
def permutations_in_dict(string, words):
c = Counter(string)
return [w for w in words if c == Counter(w)]
>>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'})
['cat', 'act']
Run Code Online (Sandbox Code Playgroud)
注意:sets是无序的,因此如果您需要特定的订单,您可能需要对结果进行排序,例如return sorted(...)