kcc*_*c__ 11 python machine-learning scikit-learn joblib
我做了一个示例程序来使用sklearn训练SVM.这是代码
from sklearn import svm
from sklearn import datasets
from sklearn.externals import joblib
clf = svm.SVC()
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf.fit(X, y)
print(clf.predict(X))
joblib.dump(clf, 'clf.pkl')
Run Code Online (Sandbox Code Playgroud)
当我转储模型文件时,我得到了这么多的文件.:
['clf.pkl','clf.pkl_01.npy','clf.pkl_02.npy','clf.pkl_03.npy','clf.pkl_04.npy','clf.pkl_05.npy','clf. pkl_06.npy','clf.pkl_07.npy','clf.pkl_08.npy','clf.pkl_09.npy','clf.pkl_10.npy','clf.pkl_11.npy']
如果我做错了,我很困惑.或者这是正常的吗?什么是*.npy文件.为什么有11个?
Ibr*_*iev 18
要将所有内容保存到1个文件中,您应将压缩设置为True或任何数字(例如1).
但是你应该知道,对于joblib转储/加载的主要特性,np数组的分离表示是必需的,由于这种分离的表示,joblib可以加载和保存具有比Pickle更快的np数组的对象,而与Pickle joblib相比,它可以正确地保存和加载具有memmap numpy数组的对象.如果你想要整个对象的一个文件序列化(并且不想保存memmap np数组) - 我认为在这种情况下使用Pickle,AFAIK会更好,joblib转储/加载功能将以与泡菜.
import numpy as np
from scikit-learn.externals import joblib
vector = np.arange(0, 10**7)
%timeit joblib.dump(vector, 'vector.pkl')
# 1 loops, best of 3: 818 ms per loop
# file size ~ 80 MB
%timeit vector_load = joblib.load('vector.pkl')
# 10 loops, best of 3: 47.6 ms per loop
# Compressed
%timeit joblib.dump(vector, 'vector.pkl', compress=1)
# 1 loops, best of 3: 1.58 s per loop
# file size ~ 15.1 MB
%timeit vector_load = joblib.load('vector.pkl')
# 1 loops, best of 3: 442 ms per loop
# Pickle
%%timeit
with open('vector.pkl', 'wb') as f:
pickle.dump(vector, f)
# 1 loops, best of 3: 927 ms per loop
%%timeit
with open('vector.pkl', 'rb') as f:
vector_load = pickle.load(f)
# 10 loops, best of 3: 94.1 ms per loop
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
16611 次 |
| 最近记录: |