Par*_*gue 8 python algorithm scrabble
我正在尝试创建一个函数best_tiles
,该函数接收您手中的瓷砖数量并返回一组瓷砖,它允许您生成最多数量的唯一英语有效单词,假设您只能使用每个瓷砖一次。
例如,您手中的这组牌(A, B, C)
可以拼出 CAB、BAC、AB 和 BA(所有这些都是英文单词),因此您可以用该组拼写 4 个独特的单词。使用(B, A, A)
,您可以拼出 5 个单词:ABA、BAA、AA、AB 和 BA。目标是找到一组字母,使您可以拼出最多数量的英语有效单词(无需替换)。
因此,如果 5 是 N = 3 的任意字母组合可以拼写的最大单词数,则运行best_tiles( n = 3 )
将返回B, A, A
.
我想知道如何有效地实现这一点?我目前的方法在字母数量方面根本不能很好地扩展。
我读了一个词表。在这种情况下,我在这里使用 enable.txt:https ://www.wordgamedictionary.com/enable/
import os
path = "enable.txt"
words = []
with open(path , encoding='utf8') as f:
for values in f:
words.append(list(values.strip().upper()))
Run Code Online (Sandbox Code Playgroud)
我创建了一个函数word_in_tiles
h/t smack89 ,它返回是否有可能在给定瓷砖集的情况下构造一个单词:
def word_in_tiles(word, tiles):
tiles_counter = collections.Counter(tiles)
return all(tiles_counter.get(ch, 0) == cnt for ch,cnt in
collections.Counter(word).items())
Run Code Online (Sandbox Code Playgroud)
然后我创建了一个函数get_all_words
,该函数生成一个列表,其中包含一个人可以从单词列表和拼贴集中拼出的所有可能单词。
def get_all_words(tile_set, words):
# words is a word list
return [i for i in words if word_in_tiles(i, tile_set)]
Run Code Online (Sandbox Code Playgroud)
确定哪个tileset是三个字母的“最佳”的极其天真的方法如下:
我首先创建一个给定长度的所有可能组合的列表。所以对于长度 3,我会这样做:
import string
import itertools
letters = string.ascii_lowercase
all_three_letter_combinations = list(itertools.combinations_with_replacement(letters, 3))
# Create a list of only words that are three letters are less
three_letter = [i for i in words if len(i) <= 3]
sequence_dict = dict()
for i in all_three_letter_combinations:
string_test = "".join(i).upper()
sequence_dict[i] = get_all_words(string_test, three_letter)
Run Code Online (Sandbox Code Playgroud)
然后删除没有长度的值并按结果的长度排序:
res = {k: v for k, v in sequence_dict.items() if len(v) >= 1}
def GetMaxLength(res):
return max((len(v), v, k) for k, v in res.items())[1:]
Run Code Online (Sandbox Code Playgroud)
GetMaxLength(res) 你知道,对于三个字母,产生最多英语有效词的图块集是T A E
可以产生以下词的['AE', 'AT', 'ATE', 'EAT', 'ET', 'ETA', 'TA', 'TAE', 'TEA']
我希望能够将其扩展到 N = 15。这样做的最佳程序是什么?
我想这已经足够好了!
这是我在 PyPy 下运行的代码的日志:
0:00:00.000232
E
0:00:00.001251
ER
0:00:00.048733
EAT
0:00:00.208744
ESAT
0:00:00.087425
ESATL
0:00:00.132049
ESARTP
0:00:00.380296
ESARTOP
0:00:01.409129
ESIARTLP
0:00:03.433526
ESIARNTLP
0:00:10.391252
ESIARNTOLP
0:00:25.651012
ESIARNTOLDP
0:00:56.642405
ESIARNTOLCDP
0:01:57.257293
ESIARNTOLCDUP
0:03:55.933906
ESIARNTOLCDUPM
0:07:17.146036
ESIARNTOLCDUPMG
0:10:14.844347
ESIARNTOLCDUPMGH
0:13:34.722600
ESIARNTOLCDEUPMGH
0:18:14.215019
ESIARNTOLCDEUPMGSH
0:22:47.129284
ESIARNTOLCDEUPMGSHB
0:27:56.859511
ESIARNTOLCDEUPMGSHBYK
0:46:20.448502
ESIARNTOLCDEUPMGSHBYAK
0:57:15.213635
ESIARNTOLCDEUPMGSHIBYAT
1:09:55.530180
ESIARNTOLCDEUPMGSHIBYATF
1:18:35.209599
ESIARNTOLCDEUPMGSHIBYATRF
1:21:54.095119
ESIARNTOLCDEUPMGSHIBYATRFV
1:20:16.978411
ESIARNTOLCDEUPMGSHIBYAOTRFV
1:14:24.253660
ESIARNTOLCDEUPMGSHIBYAONTRFV
1:00:37.405571
Run Code Online (Sandbox Code Playgroud)
关键的改进是这些。
这是代码。
import os
import datetime
path = "enable.txt"
words = []
with open(path) as f:
for values in f:
words.append(values.strip().upper())
key_count = {}
for word in words:
seen = {}
for letter in word:
if letter not in seen:
seen[letter] = 0
key = (letter, seen[letter])
if key not in key_count:
key_count[key] = 1
else:
key_count[key] += 1
seen[letter] += 1
KEYS = sorted(key_count.keys(), key=lambda key: -key_count[key])
#print(KEYS)
#print(len(KEYS))
KEY_POS = {}
for i in range(len(KEYS)):
KEY_POS[KEYS[i]] = i
# Now we will build a trie. Every node has a list of words, and a dictionary
# from the next letter farther in the trie.
# BUT TRICK:, we will map each word to a sequence of numbers, and those numbers
# will be indexes into KEYS. This allows us to use the fact that a second 'e' is
# unlikely, so we can deal with that efficiently.
class Trie:
def __init__(self, path):
self.words = []
self.dict = {}
self.min_pos = -1
self.max_pos = -1
self.words = []
self.count_words = 0
self.path = path
def add_word (self, word):
trie = self
poses = []
seen = {}
for letter in word:
if letter not in seen:
seen[letter] = 0
key = (letter, seen[letter])
poses.append(KEY_POS[(key)])
seen[letter] += 1
sorted_poses = sorted(poses);
for i in range(len(sorted_poses)):
trie.count_words += 1
pos = sorted_poses[i]
if pos not in trie.dict:
trie.dict[pos] = Trie(trie.path + KEYS[pos][0])
if trie.max_pos < pos:
trie.max_pos = pos
trie = trie.dict[pos]
trie.count_words += 1
trie.words.append(word)
base_trie = Trie('')
for word in words:
base_trie.add_word(word);
def best_solution (size):
def solve (subset, pos, best, partial):
found = sum(x[0] for x in partial)
upper_bound = sum(x[1] for x in partial)
if size <= len(subset) or upper_bound < best or len(KEYS) <= pos:
return (found, subset)
if best < found:
best = found
# Figure out our next calculations.
partial_include = []
partial_exclude = []
finalized_found = 0
for this_found, this_bound, this_trie in partial:
if this_trie is None:
# This is a generic record of already emptied tries
finalized_found += this_found
elif pos in this_trie.dict:
include_trie = this_trie.dict[pos]
partial_include.append((
this_found + len(include_trie.words),
include_trie.count_words + this_found,
include_trie
))
# We included the tally of found words in the previous partial.
# So do not double-count by including it again
partial_include.append((
0,
this_bound - include_trie.count_words - this_found,
this_trie
))
partial_exclude.append((
this_found,
this_bound - include_trie.count_words,
this_trie
))
elif this_found == this_bound:
finalized_found += this_found
else:
partial_include.append((
this_found,
this_bound,
this_trie
))
partial_exclude.append((
this_found,
this_bound,
this_trie
))
if 0 < finalized_found:
partial_include.append(
(finalized_found, finalized_found, None)
)
partial_exclude.append(
(finalized_found, finalized_found, None)
)
found_include, subset_include = solve(subset + [pos], pos+1, best, partial_include)
if best < found_include:
best = found_include
found_exclude, subset_exclude = solve(subset, pos+1, best, partial_exclude)
if found_include < found_exclude:
return (found_exclude, subset_exclude)
else:
return (found_include, subset_include)
count, subset = solve([], 0, 0, [(len(base_trie.words), base_trie.count_words, base_trie)])
return ''.join([KEYS[x][0] for x in subset])
for i in range(20):
start = datetime.datetime.now()
print(best_solution(i))
print(datetime.datetime.now() - start)
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
509 次 |
最近记录: |