har*_*on4 45 python numpy scikit-learn
我有以下代码来测试一些最流行的sklearn python库的ML算法:
import numpy as np
from sklearn import metrics, svm
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
trainingData = np.array([ [2.3, 4.3, 2.5], [1.3, 5.2, 5.2], [3.3, 2.9, 0.8], [3.1, 4.3, 4.0] ])
trainingScores = np.array( [3.4, 7.5, 4.5, 1.6] )
predictionData = np.array([ [2.5, 2.4, 2.7], [2.7, 3.2, 1.2] ])
clf = LinearRegression()
clf.fit(trainingData, trainingScores)
print("LinearRegression")
print(clf.predict(predictionData))
clf = svm.SVR()
clf.fit(trainingData, trainingScores)
print("SVR")
print(clf.predict(predictionData))
clf = LogisticRegression()
clf.fit(trainingData, trainingScores)
print("LogisticRegression")
print(clf.predict(predictionData))
clf = DecisionTreeClassifier()
clf.fit(trainingData, trainingScores)
print("DecisionTreeClassifier")
print(clf.predict(predictionData))
clf = KNeighborsClassifier()
clf.fit(trainingData, trainingScores)
print("KNeighborsClassifier")
print(clf.predict(predictionData))
clf = LinearDiscriminantAnalysis()
clf.fit(trainingData, trainingScores)
print("LinearDiscriminantAnalysis")
print(clf.predict(predictionData))
clf = GaussianNB()
clf.fit(trainingData, trainingScores)
print("GaussianNB")
print(clf.predict(predictionData))
clf = SVC()
clf.fit(trainingData, trainingScores)
print("SVC")
print(clf.predict(predictionData))
Run Code Online (Sandbox Code Playgroud)
前两个工作正常,但我在LogisticRegression调用中遇到以下错误:
root@ubupc1:/home/ouhma# python stack.py
LinearRegression
[ 15.72023529 6.46666667]
SVR
[ 3.95570063 4.23426243]
Traceback (most recent call last):
File "stack.py", line 28, in <module>
clf.fit(trainingData, trainingScores)
File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1174, in fit
check_classification_targets(y)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 172, in check_classification_targets
raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'
Run Code Online (Sandbox Code Playgroud)
输入数据与之前的调用相同,那么这里发生了什么?
顺便说一句,为什么第一次预测LinearRegression()和SVR()算法存在巨大差异(15.72 vs 3.95)?
Max*_*ers 56
您正在将浮点数传递给分类器,该分类器将分类值作为目标向量.如果你将它转换为int它将被接受作为输入(虽然这是正确的方法,这将是有问题的).
最好通过使用scikit labelEncoder功能转换您的训练分数.
您的DecisionTree和KNeighbors限定符也是如此.
from sklearn import preprocessing
from sklearn import utils
lab_enc = preprocessing.LabelEncoder()
encoded = lab_enc.fit_transform(trainingScores)
>>> array([1, 3, 2, 0], dtype=int64)
print(utils.multiclass.type_of_target(trainingScores))
>>> continuous
print(utils.multiclass.type_of_target(trainingScores.astype('int')))
>>> multiclass
print(utils.multiclass.type_of_target(encoded))
>>> multiclass
Run Code Online (Sandbox Code Playgroud)
Tho*_* G. 31
LogisticRegression不是回归而是分类!
该Y变量必须是分类的类,
(例如0或1)
而不是一个continuous变量,
那将是一个回归问题。
Sam*_*rry 19
在尝试将浮动数据输入分类器时,我遇到了同样的问题.我想保留浮点数而不是整数来保证准确性.尝试使用回归算法.例如:
import numpy as np
from sklearn import linear_model
from sklearn import svm
classifiers = [
svm.SVR(),
linear_model.SGDRegressor(),
linear_model.BayesianRidge(),
linear_model.LassoLars(),
linear_model.ARDRegression(),
linear_model.PassiveAggressiveRegressor(),
linear_model.TheilSenRegressor(),
linear_model.LinearRegression()]
trainingData = np.array([ [2.3, 4.3, 2.5], [1.3, 5.2, 5.2], [3.3, 2.9, 0.8], [3.1, 4.3, 4.0] ])
trainingScores = np.array( [3.4, 7.5, 4.5, 1.6] )
predictionData = np.array([ [2.5, 2.4, 2.7], [2.7, 3.2, 1.2] ])
for item in classifiers:
print(item)
clf = item
clf.fit(trainingData, trainingScores)
print(clf.predict(predictionData),'\n')
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
89999 次 |
| 最近记录: |