Par*_*gue 12 python algorithm performance cpu-word
我使用这个可怕且低效的实现来找到可以删除最多连续最后一个字母的单词并且仍然是一个单词.
例如,Rodeo是众所周知的:Rodeo,Rode,Rod,Ro.该计划找到了"作曲家":作曲家,作曲家,作曲家,作曲家,作曲家
我不知道我应该如何去创建一个发现可以有任何字母(不只是最后的)移除,它仍然被认为是一个字的最长的单词的程序:
例如:野兽,最好,下注,是 - 将是一个有效的可能性
这是我的程序,找到删除连续字母的那个(我也有兴趣听听如何改进和优化):
#Recursive function that finds how many letters can be removed from a word and
#it still be valid.
def wordCheck(word, wordList, counter):
if len(word)>=1:
if word in wordList:
return (wordCheck(word[0:counter-1], wordList, counter-1))
else:
return counter
return counter
def main():
a = open('C:\\Python32\\megalist2.txt', 'r+')
wordList = set([line.strip() for line in a])
#megaList contains a sorted list of tuple of
#(the word, how many letters can be removed consecutively)
megaList = sorted([(i, len(i)-1- wordCheck(i, wordList, len(i))) for i in wordList], key= lambda megaList: megaList[1])
for i in megaList:
if i[1] > 3:
print (i)
if __name__ == '__main__':
main()
Run Code Online (Sandbox Code Playgroud)
nin*_*cko 10
for each english word W:
for each letter you can remove:
remove the letter
if the result W' is also word:
draw a line W->W'
for each english word W:
connect ROOT-> each english word W
use a graph search algorithm to find the longest path starting at ROOT
(specifically, the words are now in a directed acyclic graph; traverse
the graph left-right-top-down, i.e. in a "topological sort", keeping
track of the longest candidate path to reach each node; this achieves
the result in linear time)
Run Code Online (Sandbox Code Playgroud)
这个算法只需要线性O(#wordsInEnglish*averageWordLength)时间!基本上只要读取输入就可以了
可以很容易地修改它以找到连续删除的字母:而不是每个节点保留一个候选字符(Node('rod').candidate = rodeo->rode->rod
),保留每个节点的候选族和您删除的字母的索引候选者(节点('rod').candidate = { rodeo->rod|e->rod|
,road->ro|d
}).这具有相同的运行时间.
这是我刚刚写的一个实现.我的~235k单词列表在大约五秒内运行.输出不显示整个链,但您可以轻松地从输出中重新组合它.
# Load the words into a dictionary
words = dict((x.strip(), set()) for x in open("/usr/share/dict/words"))
# For each word, remove each letter and see if the remaining word is still
# in the dictionary. If so, add it to the set of shorter words associated with
# that word in the dictionary.
# For example, bear -> {ear, bar, ber}
for w in words:
for i in range(len(w)):
shorter = w[:i] + w[i+1:]
if shorter in words:
words[w].add(shorter)
# Sort the words by length so we process the shortest ones first
sortedwords = sorted(words, key=len)
# For each word, the maximum chain length is:
# - the maximum of the chain lengths of each shorter word, if any
# - or 0 if there are no shorter words for this word
# Note that because sortedwords is sorted by length, we will always
# have maxlength[x] already available for each shorter word x
maxlength = {}
for w in sortedwords:
if words[w]:
maxlength[w] = 1 + max(maxlength[x] for x in words[w])
else:
maxlength[w] = 0
# Print the words in all chains for each of the top 10 words
toshow = sorted(words, key=lambda x: maxlength[x], reverse=True)[:10]
while toshow:
w = toshow[0]
print(w, [(x, maxlength[x]) for x in words[w]])
toshow = toshow[1:] + list(x for x in words[w] if x not in toshow)
Run Code Online (Sandbox Code Playgroud)
我字典中最长的单词链是:
归档时间: |
|
查看次数: |
2322 次 |
最近记录: |