Rub*_*ben 5 python scikit-learn joblib
我使用joblib保存了我的分类器管道:
vec = TfidfVectorizer(sublinear_tf=True, max_df=0.5, ngram_range=(1, 3))
pac_clf = PassiveAggressiveClassifier(C=1)
vec_clf = Pipeline([('vectorizer', vec), ('pac', pac_clf)])
vec_clf.fit(X_train,y_train)
joblib.dump(vec_clf, 'class.pkl', compress=9)
Run Code Online (Sandbox Code Playgroud)
现在我正在尝试在生产环境中使用它:
def classify(title):
#load classifier and predict
classifier = joblib.load('class.pkl')
#vectorize/transform the new title then predict
vectorizer = TfidfVectorizer(sublinear_tf=True, max_df=0.5, ngram_range=(1, 3))
X_test = vectorizer.transform(title)
predict = classifier.predict(X_test)
return predict
Run Code Online (Sandbox Code Playgroud)
我得到的错误是:ValueError:词汇表没有安装或是空的!我想我应该从te joblid加载词汇表,但我不能让它工作
只需更换:
#load classifier and predict
classifier = joblib.load('class.pkl')
#vectorize/transform the new title then predict
vectorizer = TfidfVectorizer(sublinear_tf=True, max_df=0.5, ngram_range=(1, 3))
X_test = vectorizer.transform(title)
predict = classifier.predict(X_test)
return predict
Run Code Online (Sandbox Code Playgroud)
通过:
# load the saved pipeline that includes both the vectorizer
# and the classifier and predict
classifier = joblib.load('class.pkl')
predict = classifier.predict(X_test)
return predict
Run Code Online (Sandbox Code Playgroud)
class.pkl包括完整的管道,不需要创建新的矢量化器实例.正如错误消息所示,您需要重用首先训练的矢量化器,因为从令牌(字符串ngrams)到列索引的特征映射将保存在矢量化器本身中.该映射被命名为"词汇表".
| 归档时间: |
|
| 查看次数: |
2972 次 |
| 最近记录: |