Python搜索大列表速度

Question

Python搜索大列表速度

Eng*_*rad 9 python memory optimization text list

我在一个非常大的列表中遇到了速度问题.我有一个包含很多错误和非常奇怪的文字的文件.我正在尝试使用difflib在我拥有的650,000个单词的字典文件中找到最接近的匹配项.下面这种方法效果很好,但非常慢,我想知道是否有更好的方法来解决这个问题.这是代码:

from difflib import SequenceMatcher
headWordList = [ #This is a list of 650,000 words]


openFile = open("sentences.txt","r")

for line in openFile:
    sentenceList.append[line]

percentage = 0
count = 0

for y in sentenceList:
      if y not in headwordList:

         for x in headwordList:
             m = SequenceMatcher(None, y.lower(), x)

             if m.ratio() > percentage:
                 percentage = m.ratio()

                 word = x

         if percentage > 0.86:        
             sentenceList[count] = word
count=count+1

Run Code Online (Sandbox Code Playgroud)

感谢您的帮助,软件工程甚至不是我的强项.非常感激.

Answer 1

Dil*_*lch 7

Two things that might provide some small help:

1) Use the approach in this SO answer to read through your large file the most efficiently.

2) Change your code from

for x in headwordList:
    m = SequenceMatcher(None, y.lower(), 1)

Run Code Online (Sandbox Code Playgroud)

to

yLower = y.lower()
for x in headwordList:
    m = SequenceMatcher(None, yLower, 1)

Run Code Online (Sandbox Code Playgroud)

You're converting each sentence to lower 650,000 times. No need for that.

归档时间：	12 年，2 月前
查看次数：	4938 次
最近记录：	12 年，2 月前