什么会导致WordNetCorpusReader没有属性LazyCorpusLoader?

Cec*_*lia 11 python multithreading attributes exception nltk

我有一个简短的函数,通过将它与Natural Language Toolkit中的WordNet语料库进行比较来检查单词是否是真正的单词.我从一个验证txt文件的线程调用此函数.当我运行我的代码时,第一次调用该函数时,它会抛出带有消息的AttributeError

"'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'"
Run Code Online (Sandbox Code Playgroud)

当我暂停执行时,同一行代码不会引发错误,因此我假设在我第一次调用时尚未加载语料库导致错误.

我曾尝试使用nltk.wordnet.ensure_loaded()强制加载语料库,但我仍然得到同样的错误.

这是我的功能:

from nltk.corpus import wordnet as wn
from nltk.corpus import stopwords
from nltk.corpus.reader.wordnet import WordNetError
import sys

cachedStopWords = stopwords.words("english")

def is_good_word(word):
    word = word.strip()
    if len(word) <= 2:
        return 0
    if word in cachedStopWords:
        return 0
    try:
        wn.ensure_loaded()
        if len(wn.lemmas(str(word), lang='en')) == 0:
            return 0
    except WordNetError as e:
        print "WordNetError on concept {}".format(word)
    except AttributeError as e:
        print "Attribute error on concept {}: {}".format(word, e.message)
    except:
        print "Unexpected error on concept {}: {}".format(word, sys.exc_info()[0])
    else:
        return 1
    return 1

print (is_good_word('dog')) #Does NOT throw error
Run Code Online (Sandbox Code Playgroud)

如果我在全局范围内的同一文件中有print语句,则不会抛出错误.但是,如果我从我的线程中调用它,它确实如此.以下是重现错误的最小示例.我已经测试了它,在我的机器上它给出了输出

Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Run Code Online (Sandbox Code Playgroud)

最小的例子:

import time
import threading
from filter_tag import is_good_word

class ProcessMetaThread(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self)

    def run(self):
        is_good_word('dog') #Throws error


def process_meta(numberOfThreads):

    threadsList = []
    for i in range(numberOfThreads):
        t = ProcessMetaThread()
        t.setDaemon(True)
        t.start()
        threadsList.append(t)

    numComplete = 0
    while numComplete < numberOfThreads:
        # Iterate over the active processes
        for processNum in range(0, numberOfThreads):
            # If a process actually exists
            if threadsList != None:
                # If the process is finished
                if not threadsList[processNum] == None:
                    if not threadsList[processNum].is_alive():
                        numComplete += 1
                        threadsList[processNum] = None
        time.sleep(5)

    print 'Processes Finished'


if __name__ == '__main__':
    process_meta(10)
Run Code Online (Sandbox Code Playgroud)

πόδ*_*κύς 16

我已运行您的代码并得到相同的错误.有关可行的解决方案,请参见下文.这是解释:

LazyCorpusLoader是一个代理对象,在加载语料库之前代表语料库对象.(这可以防止NLTK在您需要之前将大量语料库加载到内存中.)但是,第一次访问此代理对象时,它将成为您要加载的语料库.也就是说,在LazyCorpusLoader代理对象将其输入__dict____class__进入__dict____class__正在加载的语料库.

如果您将代码与上述错误进行比较,则可以看到在尝试创建类的10个实例时收到了9个错误.LazyCorpusLoader代理对象第一次转换为WordNetCorpusReader对象是成功的.第一次访问wordnet时触发了此操作:

第一线程

from nltk.corpus import wordnet as wn
def is_good_word(word):
    ...
    wn.ensure_loaded()  # `LazyCorpusLoader` conversion into `WordNetCorpusReader` starts
Run Code Online (Sandbox Code Playgroud)

第二线程

但是,当您开始is_good_word在第二个线程中运行函数时,您的第一个线程尚未完全将LazyCorpusLoader代理对象转换为WordNetCorpusReader.wn仍然是一个LazyCorpusLoader代理对象,所以它__load再次开始这个过程.一旦它到达的地方尝试其转换点__class__,并__dict__WordNetCorpusReader对象,但是,第一个线程已经转换的LazyCorpusLoader代理对象成WordNetCorpusReader.我的猜测是你在下面的评论中遇到了错误:

class LazyCorpusLoader(object):
    ...
    def __load(self):
        ...
        corpus = self.__reader_cls(root, *self.__args, **self.__kwargs)  # load corpus
        ...
        # self.__args == self._LazyCorpusLoader__args
        args, kwargs  = self.__args, self.__kwargs                       # most likely the line throwing the error
Run Code Online (Sandbox Code Playgroud)

一旦第一个线程将LazyCorpusLoader代理对象转换为WordNetCorpusReader对象,受损的名称将不再起作用.该WordNetCorpusReader对象LazyCorpusLoader在其错位名称中没有任何位置.(self.__args相当于self._LazyCorpusLoader__args,而对象是一个LazyCorpusLoader对象.)因此,您收到以下错误:

AttributeError: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Run Code Online (Sandbox Code Playgroud)

替代

鉴于此问题,您需要进入线程之前访问该wn对象.以下是适当修改的代码:

from nltk.corpus import wordnet as wn
from nltk.corpus import stopwords
from nltk.corpus.reader.wordnet import WordNetError
import sys
import time
import threading

cachedStopWords = stopwords.words("english")


def is_good_word(word):
    word = word.strip()
    if len(word) <= 2:
        return 0
    if word in cachedStopWords:
        return 0
    try:
        if len(wn.lemmas(str(word), lang='en')) == 0:     # no longer the first access of wn
            return 0
    except WordNetError as e:
        print("WordNetError on concept {}".format(word))
    except AttributeError as e:
        print("Attribute error on concept {}: {}".format(word, e.message))
    except:
        print("Unexpected error on concept {}: {}".format(word, sys.exc_info()[0]))
    else:
        return 1
    return 1


class ProcessMetaThread(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self)

    def run(self):
        is_good_word('dog')


def process_meta(numberOfThreads):
    print wn.__class__            # <class 'nltk.corpus.util.LazyCorpusLoader'>
    wn.ensure_loaded()            # first access to wn transforms it
    print wn.__class__            # <class 'nltk.corpus.reader.wordnet.WordNetCorpusReader'>
    threadsList = []
    for i in range(numberOfThreads):
        start = time.clock()
        t = ProcessMetaThread()
        print time.clock() - start
        t.setDaemon(True)
        t.start()
        threadsList.append(t)

    numComplete = 0
    while numComplete < numberOfThreads:
        # Iterate over the active processes
        for processNum in range(0, numberOfThreads):
            # If a process actually exists
            if threadsList != None:
                # If the process is finished
                if not threadsList[processNum] == None:
                    if not threadsList[processNum].is_alive():
                        numComplete += 1
                        threadsList[processNum] = None
        time.sleep(5)

    print('Processes Finished')


if __name__ == '__main__':
    process_meta(10)
Run Code Online (Sandbox Code Playgroud)

我测试了上面的代码并没有收到任何错误.

  • 很棒的解释!但你能解释一下`wm`是如何处于不一致状态的吗?它的dicts是线程全局的,因此它包含`WordnetCorpusReader`的受损属性.但是什么线程本地信息使它看起来像第二个线程的`LazyCorpusLoader`? (2认同)