用nltk训练自己的分类器后，如何将其加载到textblob中？

Question

用nltk训练自己的分类器后，如何将其加载到textblob中？

Mar*_*ter 6 python nltk textblob naivebayes

textblob中的内置分类器非常笨。它是根据电影评论进行训练的，所以我在上下文中创建了大量示例（57,000个故事，分为正面或负面故事），然后使用nltk.尝试使用textblob对其进行训练的方式进行了训练，但始终失败：

with open('train.json', 'r') as fp:
    cl = NaiveBayesClassifier(fp, format="json")

Run Code Online (Sandbox Code Playgroud)

那将运行数小时，并最终导致内存错误。

我查看了源代码，发现它只是使用nltk并将其包装起来，所以我改用了它，并且可以正常工作。

nltk训练集的结构必须是一个元组列表，其中第一部分是文本中的单词计数器和出现频率。元组的第二部分是“ pos”或“ neg”。

>>> train_set = [(Counter(i["text"].split()),i["label"]) for i in data[200:]]
>>> test_set = [(Counter(i["text"].split()),i["label"]) for i in data[:200]] # withholding 200 examples for testing later

>>> cl = nltk.NaiveBayesClassifier.train(train_set) # <-- this is the same thing textblob was using

>>> print("Classifier accuracy percent:",(nltk.classify.accuracy(cl, test_set))*100)
('Classifier accuracy percent:', 66.5)
>>>>cl.show_most_informative_features(75)

Run Code Online (Sandbox Code Playgroud)

然后我腌了。

with open('storybayes.pickle','wb') as f:
    pickle.dump(cl,f)

Run Code Online (Sandbox Code Playgroud)

现在...我拿了这个腌制的文件，然后重新打开它以获取nltk.classifier'nltk.classify.naivebayes.NaiveBayesClassifier'>-并尝试将其输入到textblob中。代替

from textblob.classifiers import NaiveBayesClassifier
blob = TextBlob("I love this library", analyzer=NaiveBayesAnalyzer())

Run Code Online (Sandbox Code Playgroud)

我试过了：

blob = TextBlob("I love this library", analyzer=myclassifier)
Traceback (most recent call last):
  File "<pyshell#116>", line 1, in <module>
    blob = TextBlob("I love this library", analyzer=cl4)
  File "C:\python\lib\site-packages\textblob\blob.py", line 369, in __init__
    parser, classifier)
  File "C:\python\lib\site-packages\textblob\blob.py", line 323, in 
_initialize_models
    BaseSentimentAnalyzer, BaseBlob.analyzer)
  File "C:\python\lib\site-packages\textblob\blob.py", line 305, in 
_validated_param
    .format(name=name, cls=base_class_name))
ValueError: analyzer must be an instance of BaseSentimentAnalyzer

Run Code Online (Sandbox Code Playgroud)

现在怎么办？我查看了源代码，它们都是类，但并不完全相同。

Answer 1

Mar*_*ter 0

另一个更具前瞻性的解决方案是使用 spaCy 来构建模型而不是textblobor nltk。这对我来说是新的，但似乎更容易使用且更强大： https://spacy.io/usage/spacy-101#section-lightning-tour

“spaCy 是自然语言处理领域的 Rails Ruby。”

import spacy
import random

nlp = spacy.load('en') # loads the trained starter model here
train_data = [("Uber blew through $1 million", {'entities': [(0, 4, 'ORG')]})] # better model stuff

with nlp.disable_pipes(*[pipe for pipe in nlp.pipe_names if pipe != 'ner']):
    optimizer = nlp.begin_training()
    for i in range(10):
        random.shuffle(train_data)
        for text, annotations in train_data:
            nlp.update([text], [annotations], sgd=optimizer)
nlp.to_disk('/model')

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，7 月前
查看次数：	572 次
最近记录：	6 年，3 月前