Nat*_*Coy 3 classification perceptron nltk anaconda python-3.5
我想尝试使用Python 3.5 PerceptronTagger中的nltk包,但我收到错误TypeError: 'LazySubsequence' object does not support item assignment
我想用棕色语料库中带有universal标签的数据训练它.
这是我遇到问题时运行的代码.
import nltk,math
tagged_sentences = nltk.corpus.brown.tagged_sents(categories='news',tagset='universal')
i = math.floor(len(tagged_sentences)*0.2)
testing_sentences = tagged_sentences[0:i]
training_sentences = tagged_sentences[i:]
perceptron_tagger = nltk.tag.perceptron.PerceptronTagger(load=False)
perceptron_tagger.train(training_sentences)
Run Code Online (Sandbox Code Playgroud)
它无法正确训练,并提供以下堆栈跟踪.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-10-61332d63d2c3> in <module>()
1 perceptron_tagger = nltk.tag.perceptron.PerceptronTagger(load=False)
----> 2 perceptron_tagger.train(training_sentences)
/home/nathan/anaconda3/lib/python3.5/site-packages/nltk/tag/perceptron.py in train(self, sentences, save_loc, nr_iter)
192 c += guess == tags[i]
193 n += 1
--> 194 random.shuffle(sentences)
195 logging.info("Iter {0}: {1}/{2}={3}".format(iter_, c, n, _pc(c, n)))
196 self.model.average_weights()
/home/nathan/anaconda3/lib/python3.5/random.py in shuffle(self, x, random)
270 # pick an element in x[:i+1] with which to exchange x[i]
271 j = randbelow(i+1)
--> 272 x[i], x[j] = x[j], x[i]
273 else:
274 _int = int
TypeError: 'LazySubsequence' object does not support item assignment
Run Code Online (Sandbox Code Playgroud)
它似乎来自模块中的shuffle功能,random但这似乎并不正确.
还有其他可能导致问题的原因吗?有人有这个问题吗?
我正在使用Anaconda Python 3.5在Ubuntu 16.04.1上运行它.该nltk版本是3.2.1
grep在nltk源代码中做一些事情就找到了答案.
在文件site-packages/nltk/util.py中声明了类.
class LazySubsequence(AbstractLazySequence):
"""
A subsequence produced by slicing a lazy sequence. This slice
keeps a reference to its source sequence, and generates its values
by looking them up in the source sequence.
"""
Run Code Online (Sandbox Code Playgroud)
从解释另一个简单的测试后,我看到有关以下详细信息type()的tagged_sentences
>>> import nltk
>>> tagged_sentences = nltk.corpus.brown.tagged_sents(categories='news',tagset='universal')
>>> type(tagged_sentences)
<class 'nltk.corpus.reader.util.ConcatenatedCorpusView'>
Run Code Online (Sandbox Code Playgroud)
我在文件中看到了 site-packages/nltk/corpus/reader/util.py
class ConcatenatedCorpusView(AbstractLazySequence):
"""
A 'view' of a corpus file that joins together one or more
``StreamBackedCorpusViews<StreamBackedCorpusView>``. At most
one file handle is left open at any time.
"""
Run Code Online (Sandbox Code Playgroud)
random包装的最终测试证明了我创建的方式存在问题tagged_sentences
>>> import random
>>> random.shuffle(training_sentences)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-30-0b03f0366949> in <module>()
1 import random
----> 2 random.shuffle(training_sentences)
3
4
5
/home/nathan/anaconda3/lib/python3.5/random.py in shuffle(self, x, random)
270 # pick an element in x[:i+1] with which to exchange x[i]
271 j = randbelow(i+1)
--> 272 x[i], x[j] = x[j], x[i]
273 else:
274 _int = int
TypeError: 'LazySubsequence' object does not support item assignment
Run Code Online (Sandbox Code Playgroud)
要解决这个错误,只需从nltk.corpus.brown包中明确创建一个句子列表,然后random就可以正确地重新调整数据.
import nltk,math
# explicitly make list, then LazySequence will traverse all items
tagged_sentences = [sentence for sentence in nltk.corpus.brown.tagged_sents(categories='news',tagset='universal')]
i = math.floor(len(tagged_sentences)*0.2)
testing_sentences = tagged_sentences[0:i]
training_sentences = tagged_sentences[i:]
perceptron_tagger = nltk.tag.perceptron.PerceptronTagger(load=False)
perceptron_tagger.train(training_sentences)
# no error, yea!
Run Code Online (Sandbox Code Playgroud)
现在标记按预期工作.
>>> perceptron_tagger_preds = []
>>> for test_sentence in testing_sentences:
... perceptron_tagger_preds.append(perceptron_tagger.tag([word for word,_ in test_sentence]))
>>> print(perceptron_tagger_preds[676])
[('Formula', 'NOUN'), ('is', 'VERB'), ('due', 'ADJ'), ('this', 'DET'), ('week', 'NOUN')]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
714 次 |
| 最近记录: |