Python-哪个单词可以删除最多的连续字母,仍然是字典有效的单词?

Par*_*gue 12 python algorithm performance cpu-word

我使用这个可怕且低效的实现来找到可以删除最多连续最后一个字母的单词并且仍然是一个单词.

例如,Rodeo是众所周知的:Rodeo,Rode,Rod,Ro.该计划找到了"作曲家":作曲家,作曲家,作曲家,作曲家,作曲家

我不知道我应该如何去创建一个发现可以有任何字母(不只是最后的)移除,它仍然被认为是一个字的最长的单词的程序:

例如:野兽,最好,下注,是 - 将是一个有效的可能性

这是我的程序,找到删除连续字母的那个(我也有兴趣听听如何改进和优化):

#Recursive function that finds how many letters can be removed from a word and
#it still be valid.  
def wordCheck(word, wordList, counter):

    if len(word)>=1:
        if word in wordList:
            return (wordCheck(word[0:counter-1], wordList, counter-1))
        else:
            return counter
    return counter


def main():
    a = open('C:\\Python32\\megalist2.txt', 'r+')
    wordList = set([line.strip() for line in a])
    #megaList contains a sorted list of tuple of 
    #(the word, how many letters can be removed  consecutively)
    megaList = sorted([(i, len(i)-1- wordCheck(i, wordList, len(i))) for i in wordList], key= lambda megaList: megaList[1])


    for i in megaList:
        if i[1] > 3:
            print (i)

if __name__ == '__main__':
    main()
Run Code Online (Sandbox Code Playgroud)

nin*_*cko 10

for each english word W:
    for each letter you can remove:
        remove the letter
        if the result W' is also word:
            draw a line W->W'
for each english word W:
    connect ROOT-> each english word W
use a graph search algorithm to find the longest path starting at ROOT
    (specifically, the words are now in a directed acyclic graph; traverse
    the graph left-right-top-down, i.e. in a "topological sort", keeping
    track of the longest candidate path to reach each node; this achieves 
    the result in linear time)
Run Code Online (Sandbox Code Playgroud)

这个算法只需要线性O(#wordsInEnglish*averageWordLength)时间!基本上只要读取输入就可以了

可以很容易地修改它以找到连续删除的字母:而不是每个节点保留一个候选字符(Node('rod').candidate = rodeo->rode->rod),保留每个节点的候选族和您删除的字母的索引候选者(节点('rod').candidate = { rodeo->rod|e->rod|,road->ro|d}).这具有相同的运行时间.

  • @Parseltongue:对不起,我可以澄清一下.我的意思是:首先取一个初始化的空图,将所有单词添加为节点,然后通过"画一条线"实际上意味着在图中添加一个边缘`Node(W)` - >`Node(W')`.这个算法可以在http://en.wikipedia.org/wiki/Longest_path_problem#Weighted_directed_acyclic_graphs上解释.熟悉DAG(有向无环图)是很好的.一个很好的起点可能是一些常见的图算法,称为图搜索算法,包括广度优先搜索,深度优先搜索和Dijkstra(最短路径). (2认同)

Gre*_*ill 8

这是我刚刚写的一个实现.我的~235k单词列表在大约五秒内运行.输出不显示整个链,但您可以轻松地从输出中重新组合它.

# Load the words into a dictionary
words = dict((x.strip(), set()) for x in open("/usr/share/dict/words"))

# For each word, remove each letter and see if the remaining word is still
# in the dictionary. If so, add it to the set of shorter words associated with
# that word in the dictionary.
# For example, bear -> {ear, bar, ber}
for w in words:
    for i in range(len(w)):
        shorter = w[:i] + w[i+1:]
        if shorter in words:
            words[w].add(shorter)

# Sort the words by length so we process the shortest ones first
sortedwords = sorted(words, key=len)

# For each word, the maximum chain length is:
#  - the maximum of the chain lengths of each shorter word, if any
#  - or 0 if there are no shorter words for this word
# Note that because sortedwords is sorted by length, we will always
# have maxlength[x] already available for each shorter word x
maxlength = {}
for w in sortedwords:
    if words[w]:
        maxlength[w] = 1 + max(maxlength[x] for x in words[w])
    else:
        maxlength[w] = 0

# Print the words in all chains for each of the top 10 words
toshow = sorted(words, key=lambda x: maxlength[x], reverse=True)[:10]
while toshow:
    w = toshow[0]
    print(w, [(x, maxlength[x]) for x in words[w]])
    toshow = toshow[1:] + list(x for x in words[w] if x not in toshow)
Run Code Online (Sandbox Code Playgroud)

我字典中最长的单词链是:

  • abranchiate
  • branchiate
  • branchi
  • 牧场
  • RACH
  • 阿赫
  • 一个