是否有可能在scikit-learn中缺少值?他们应该如何代表?我找不到任何关于这方面的文件.
我已经设法使用命令行sklearn将图像加载到一个文件夹中: load_sample_images()
我现在想将其转换为numpy.ndarray具有float32数据类型的格式
我能够将它转换为np.ndarray使用:np.array(X),np.array(X, dtype=np.float32)然后np.asarray(X).astype('float32')给我错误:
Run Code Online (Sandbox Code Playgroud)ValueError: setting an array element with a sequence.
有办法解决这个问题吗?
from sklearn_theano.datasets import load_sample_images
import numpy as np
kinect_images = load_sample_images()
X = kinect_images.images
X_new = np.array(X) # works
X_new = np.array(X[1], dtype=np.float32) # works
X_new = np.array(X, dtype=np.float32) # does not work
Run Code Online (Sandbox Code Playgroud) 我有一个使用Python的scikit-learn训练的分类器.如何使用Java程序中的分类器?我可以使用Jython吗?有没有办法在Python中保存分类器并在Java中加载它?还有其他方法可以使用它吗?
我有一个小语料库,我想用10倍交叉验证来计算朴素贝叶斯分类器的准确性,怎么做呢.
我使用scikit learn(LinearSVC)的线性SVM来解决二进制分类问题.我知道LinearSVC可以给我预测标签和决策分数,但我想要概率估计(对标签的信心).我想继续使用LinearSVC因为速度(与具有线性内核的sklearn.svm.SVC相比)使用逻辑函数将决策分数转换为概率是否合理?
import sklearn.svm as suppmach
# Fit model:
svmmodel=suppmach.LinearSVC(penalty='l1',C=1)
predicted_test= svmmodel.predict(x_test)
predicted_test_scores= svmmodel.decision_function(x_test)
Run Code Online (Sandbox Code Playgroud)
我想检查将概率估计简单地作为[1 /(1 + exp(-x))]来检查是否有意义,其中x是决策分数.
或者,我可以使用其他选项来分类,以便有效地执行此操作吗?
谢谢.
似乎每次迭代对象时KFold都会生成相同的值,而Shuffle Split每次都会生成不同的索引.它是否正确?如果是这样,一个用户有什么用途?
cv = cross_validation.KFold(10, n_folds=2,shuffle=True,random_state=None)
cv2 = cross_validation.ShuffleSplit(10,n_iter=2,test_size=0.5)
print(list(iter(cv)))
print(list(iter(cv)))
print(list(iter(cv2)))
print(list(iter(cv2)))
Run Code Online (Sandbox Code Playgroud)
产生以下输出:
[(array([1, 3, 5, 8, 9]), array([0, 2, 4, 6, 7])), (array([0, 2, 4, 6, 7]), array([1, 3, 5, 8, 9]))]
[(array([1, 3, 5, 8, 9]), array([0, 2, 4, 6, 7])), (array([0, 2, 4, 6, 7]), array([1, 3, 5, 8, 9]))]
[(array([4, 6, 3, 2, 7]), array([8, 1, 9, 0, 5])), (array([3, 6, 7, 0, 5]), array([9, 1, 8, 4, 2]))]
[(array([3, 0, 2, 1, 7]), array([5, …Run Code Online (Sandbox Code Playgroud) 如何获得PCA应用的特征值和特征向量?
from sklearn.decomposition import PCA
clf=PCA(0.98,whiten=True) #converse 98% variance
X_train=clf.fit_transform(X_train)
X_test=clf.transform(X_test)
Run Code Online (Sandbox Code Playgroud)
我在文档中找不到它.
我"不能"理解这里的不同结果.
编辑:
def pca_code(data):
#raw_implementation
var_per=.98
data-=np.mean(data, axis=0)
data/=np.std(data, axis=0)
cov_mat=np.cov(data, rowvar=False)
evals, evecs = np.linalg.eigh(cov_mat)
idx = np.argsort(evals)[::-1]
evecs = evecs[:,idx]
evals = evals[idx]
variance_retained=np.cumsum(evals)/np.sum(evals)
index=np.argmax(variance_retained>=var_per)
evecs = evecs[:,:index+1]
reduced_data=np.dot(evecs.T, data.T).T
print(evals)
print("_"*30)
print(evecs)
print("_"*30)
#using scipy package
clf=PCA(var_per)
X_train=data.T
X_train=clf.fit_transform(X_train)
print(clf.explained_variance_)
print("_"*30)
print(clf.components_)
print("__"*30)
Run Code Online (Sandbox Code Playgroud)
我正在研究关键字提取问题.考虑一般情况
tfidf = TfidfVectorizer(tokenizer=tokenize, stop_words='english')
t = """Two Travellers, walking in the noonday sun, sought the shade of a widespreading tree to rest. As they lay looking up among the pleasant leaves, they saw that it was a Plane Tree.
"How useless is the Plane!" said one of them. "It bears no fruit whatever, and only serves to litter the ground with leaves."
"Ungrateful creatures!" said a voice from the Plane Tree. "You lie here in my cooling shade, and …Run Code Online (Sandbox Code Playgroud) 我从sklearn网页上得到了这个:
a)管道:使用最终估算器进行变换的管道
b)Make_pipeline:根据给定的估算器构造管道.这是Pipeline构造函数的简写.
但是当我必须使用每一个时,我仍然不明白.谁能举个例子?
我想在我的数据集中的10个特征中编码3个分类特征.我用preprocessing从sklearn.preprocessing如下面这样做:
from sklearn import preprocessing
cat_features = ['color', 'director_name', 'actor_2_name']
enc = preprocessing.OneHotEncoder(categorical_features=cat_features)
enc.fit(dataset.values)
Run Code Online (Sandbox Code Playgroud)
但是,我无法继续,因为我收到此错误:
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: PG
Run Code Online (Sandbox Code Playgroud)
我很惊讶为什么它抱怨字符串,因为它应该转换它!我在这里错过了什么吗?
scikit-learn ×10
python ×9
nltk ×2
scipy ×2
java ×1
jython ×1
missing-data ×1
naivebayes ×1
nlp ×1
numpy ×1
pca ×1
scikits ×1
svm ×1
tf-idf ×1