Anu*_*pta 15 python plot machine-learning scikit-learn data-science
我正在构建二进制分类问题的模型,其中我的每个数据点都是300维(我使用300个特征).我正在使用sklearn的PassiveAggressiveClassifier.该模型表现非常好.
我希望绘制模型的决策边界.我怎么能这样做?
为了了解数据,我使用TSNE在2D中绘制它.我通过两个步骤减少了数据的维度 - 从300到50,然后从50减少到2(这是一个常见的推荐).以下是相同的代码段:
from sklearn.manifold import TSNE
from sklearn.decomposition import TruncatedSVD
X_Train_reduced = TruncatedSVD(n_components=50, random_state=0).fit_transform(X_train)
X_Train_embedded = TSNE(n_components=2, perplexity=40, verbose=2).fit_transform(X_Train_reduced)
#some convert lists of lists to 2 dataframes (df_train_neg, df_train_pos) depending on the label -
#plot the negative points and positive points
scatter(df_train_neg.val1, df_train_neg.val2, marker='o', c='red')
scatter(df_train_pos.val1, df_train_pos.val2, marker='x', c='green')
Run Code Online (Sandbox Code Playgroud)
我得到了一个体面的图表.
有没有办法可以为这个图添加一个决策边界,它代表我模型在300昏暗空间中的实际决策边界?
Aec*_*Tm. 11
一种方法是在2D绘图上施加Voronoi tesselation,即基于与2D数据点的接近度(每个预测类标签的不同颜色)对其进行着色.参见Migut等人最近的论文,2015年.
这比使用meshgrid和scikit的KNeighborsClassifier(这是一个带有Iris数据集的端到端示例;用模型/代码替换前几行)要容易得多:
import numpy as np, matplotlib.pyplot as plt
from sklearn.neighbors.classification import KNeighborsClassifier
from sklearn.datasets.base import load_iris
from sklearn.manifold.t_sne import TSNE
from sklearn.linear_model.logistic import LogisticRegression
# replace the below by your data and model
iris = load_iris()
X,y = iris.data, iris.target
X_Train_embedded = TSNE(n_components=2).fit_transform(X)
print X_Train_embedded.shape
model = LogisticRegression().fit(X,y)
y_predicted = model.predict(X)
# replace the above by your data and model
# create meshgrid
resolution = 100 # 100x100 background pixels
X2d_xmin, X2d_xmax = np.min(X_Train_embedded[:,0]), np.max(X_Train_embedded[:,0])
X2d_ymin, X2d_ymax = np.min(X_Train_embedded[:,1]), np.max(X_Train_embedded[:,1])
xx, yy = np.meshgrid(np.linspace(X2d_xmin, X2d_xmax, resolution), np.linspace(X2d_ymin, X2d_ymax, resolution))
# approximate Voronoi tesselation on resolution x resolution grid using 1-NN
background_model = KNeighborsClassifier(n_neighbors=1).fit(X_Train_embedded, y_predicted)
voronoiBackground = background_model.predict(np.c_[xx.ravel(), yy.ravel()])
voronoiBackground = voronoiBackground.reshape((resolution, resolution))
#plot
plt.contourf(xx, yy, voronoiBackground)
plt.scatter(X_Train_embedded[:,0], X_Train_embedded[:,1], c=y)
plt.show()
Run Code Online (Sandbox Code Playgroud)
请注意,这不会精确地绘制您的决策边界,而只是粗略估计边界应该位于何处(特别是在数据点较少的区域,真正的边界可能与此不同).它将在属于不同类的两个数据点之间绘制一条线,但是将它放在中间(在这种情况下确实确保这些点之间存在决策边界,但它不一定必须在中间) .
还有一些实验方法可以更好地逼近真正的决策边界,例如github上的这个
| 归档时间: |
|
| 查看次数: |
5304 次 |
| 最近记录: |