Eus*_*una 43 python classification machine-learning scikit-learn supervised-learning
我是机器学习和scikit-learn的新手.
我的问题:
(请纠正任何类型的误解)
我有一个BIG JSON数据集,我检索它并将其存储在trainList变量中.
我预先处理它以便能够使用它.
完成后,我开始分类:
码:
我目前的变量:
trainList #It is a list with all the data of my dataset in JSON form
labelList #It is a list with all the labels of my data
Run Code Online (Sandbox Code Playgroud)
方法的大部分内容:
#I transform the data from JSON form to a numerical one
X=vec.fit_transform(trainList)
#I scale the matrix (don't know why but without it, it makes an error)
X=preprocessing.scale(X.toarray())
#I generate a KFold in order to make cross validation
kf = KFold(len(X), n_folds=10, indices=True, shuffle=True, random_state=1)
#I start the cross validation
for train_indices, test_indices in kf:
X_train=[X[ii] for ii in train_indices]
X_test=[X[ii] for ii in test_indices]
y_train=[listaLabels[ii] for ii in train_indices]
y_test=[listaLabels[ii] for ii in test_indices]
#I train the classifier
trained=qda.fit(X_train,y_train)
#I make the predictions
predicted=qda.predict(X_test)
#I obtain the accuracy of this fold
ac=accuracy_score(predicted,y_test)
#I obtain the confusion matrix
cm=confusion_matrix(y_test, predicted)
#I should calculate the TP,TN, FP and FN
#I don't know how to continue
Run Code Online (Sandbox Code Playgroud)
小智 89
对于多类案例,您可以从混淆矩阵中找到所需的一切.例如,如果您的混淆矩阵如下所示:
然后你可以找到每个类,你可以找到这样的:
使用pandas/numpy,您可以同时为所有类执行此操作,如下所示:
FP = confusion_matrix.sum(axis=0) - np.diag(confusion_matrix)
FN = confusion_matrix.sum(axis=1) - np.diag(confusion_matrix)
TP = np.diag(confusion_matrix)
TN = confusion_matrix.values.sum() - (FP + FN + TP)
# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP)
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)
# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
Run Code Online (Sandbox Code Playgroud)
inv*_*ell 24
如果您有两个具有预测值和实际值的列表; 就像你看到的那样,你可以把它们传递给一个函数来计算TP,FP,TN,FN,如下所示:
def perf_measure(y_actual, y_hat):
TP = 0
FP = 0
TN = 0
FN = 0
for i in range(len(y_hat)):
if y_actual[i]==y_hat[i]==1:
TP += 1
if y_hat[i]==1 and y_actual[i]!=y_hat[i]:
FP += 1
if y_actual[i]==y_hat[i]==0:
TN += 1
if y_hat[i]==0 and y_actual[i]!=y_hat[i]:
FN += 1
return(TP, FP, TN, FN)
Run Code Online (Sandbox Code Playgroud)
从这里开始,我认为您将能够计算出您感兴趣的速率,以及其他性能指标,如特异性和敏感性.
gru*_*gly 22
根据scikit-learn文档,
根据定义,混淆矩阵C使得C [i,j]等于已知在组i中但预测在组j中的观测数.
因此在二元分类中,真阴性的计数是C [0,0],假阴性是C [1,0],真阳性是C [1,1],假阳性是C [0,1].
CM = confusion_matrix(y_true, y_pred)
TN = CM[0][0]
FN = CM[1][0]
TP = CM[1][1]
FP = CM[0][1]
Run Code Online (Sandbox Code Playgroud)
Aks*_*rit 18
您可以从混淆矩阵中获取所有参数.混淆矩阵的结构(2X2矩阵)如下
TP|FP
FN|TN
Run Code Online (Sandbox Code Playgroud)
所以
TP = cm[0][0]
FP = cm[0][1]
FN = cm[1][0]
TN = cm[1][1]
Run Code Online (Sandbox Code Playgroud)
有关详细信息,请访问https://en.wikipedia.org/wiki/Confusion_matrix
在一个班轮得到真正postives等出来的混淆矩阵是的绽吧:
from sklearn.metrics import confusion_matrix
y_true = [1, 1, 0, 0]
y_pred = [1, 0, 1, 0]
tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
print(tn, fp, fn, tp) # 1 1 1 1
Run Code Online (Sandbox Code Playgroud)
小智 6
在scikit-learn'metrics'库中,有一个confusion_matrix方法可为您提供所需的输出。
您可以使用所需的任何分类器。在这里,我以KNeighbors为例。
from sklearn import metrics, neighbors
clf = neighbors.KNeighborsClassifier()
X_test = ...
y_test = ...
expected = y_test
predicted = clf.predict(X_test)
conf_matrix = metrics.confusion_matrix(expected, predicted)
>>> print conf_matrix
>>> [[1403 87]
[ 56 3159]]
Run Code Online (Sandbox Code Playgroud)
docs:http : //scikit-learn.org/stable/modules/generation/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix
小智 6
这工作正常
来源 - https://scikit-learn.org/stable/modules/ generated/sklearn.metrics.confusion_matrix.html
tn, fp, fn, tp = confusion_matrix(y_test, predicted).ravel()
Run Code Online (Sandbox Code Playgroud)
小智 5
我写了一个只使用 numpy 的版本。我希望它能帮助你。
import numpy as np
def perf_metrics_2X2(yobs, yhat):
"""
Returns the specificity, sensitivity, positive predictive value, and
negative predictive value
of a 2X2 table.
where:
0 = negative case
1 = positive case
Parameters
----------
yobs : array of positive and negative ``observed`` cases
yhat : array of positive and negative ``predicted`` cases
Returns
-------
sensitivity = TP / (TP+FN)
specificity = TN / (TN+FP)
pos_pred_val = TP/ (TP+FP)
neg_pred_val = TN/ (TN+FN)
Author: Julio Cardenas-Rodriguez
"""
TP = np.sum( yobs[yobs==1] == yhat[yobs==1] )
TN = np.sum( yobs[yobs==0] == yhat[yobs==0] )
FP = np.sum( yobs[yobs==1] == yhat[yobs==0] )
FN = np.sum( yobs[yobs==0] == yhat[yobs==1] )
sensitivity = TP / (TP+FN)
specificity = TN / (TN+FP)
pos_pred_val = TP/ (TP+FP)
neg_pred_val = TN/ (TN+FN)
return sensitivity, specificity, pos_pred_val, neg_pred_val
Run Code Online (Sandbox Code Playgroud)
小智 5
以防万一有人在MULTI-CLASS Example 中寻找相同的东西
def perf_measure(y_actual, y_pred):
class_id = set(y_actual).union(set(y_pred))
TP = []
FP = []
TN = []
FN = []
for index ,_id in enumerate(class_id):
TP.append(0)
FP.append(0)
TN.append(0)
FN.append(0)
for i in range(len(y_pred)):
if y_actual[i] == y_pred[i] == _id:
TP[index] += 1
if y_pred[i] == _id and y_actual[i] != y_pred[i]:
FP[index] += 1
if y_actual[i] == y_pred[i] != _id:
TN[index] += 1
if y_pred[i] != _id and y_actual[i] != y_pred[i]:
FN[index] += 1
return class_id,TP, FP, TN, FN
Run Code Online (Sandbox Code Playgroud)
小智 5
在 scikit 0.22 版本中,你可以这样做
from sklearn.metrics import multilabel_confusion_matrix
y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]
mcm = multilabel_confusion_matrix(y_true, y_pred,labels=["ant", "bird", "cat"])
tn = mcm[:, 0, 0]
tp = mcm[:, 1, 1]
fn = mcm[:, 1, 0]
fp = mcm[:, 0, 1]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
67588 次 |
| 最近记录: |