随机森林分类器决策路径法（scikit）

Question

随机森林分类器决策路径法（scikit）

jc0*_*023 5 python random-forest scikit-learn

我已经在泰坦尼克号数据集上实现了一个标准的随机森林分类器，并希望探索 v0.18 中引入的 sklearn 的决策路径方法。( http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html )

但是，它输出一个稀疏矩阵，我不确定如何理解。任何人都可以就如何最好地形象化这一点提出建议吗？

#Training a simplified random forest
estimator = RandomForestClassifier(random_state=0, n_estimators=3, max_depth=3)
estimator.fit(X_train, y_train)

#Extracting the decision path for instance i = 12
i_data = X_test.iloc[12].values.reshape(1,-1)
d_path = rf_best.decision_path(i_data)

print(d_path)

Run Code Online (Sandbox Code Playgroud)

输出：

(<1x3982 '' 类型的稀疏矩阵，以压缩稀疏行格式存储 598 个元素>, array([ 0, 45,
98, 149, 190, 233, 258, 309, 360, 401, 430, 42,5, 411 , 580, 623, 668, 711, 760, 803, 852, 889, 932, 981, 1006, 1035, 1074, 1107, 1136, 1165, 1146, 135, 120, 135, 125, 135, 120 ，1553，1590，1625，1672，1707，1744，1787，1812，1863，1904，1945，1982，2017，2054，2097，2142，2191，2228，2267，2304，2343，2390，2419，2456，2489 ，2534，2583，2632，2677，2714，2739，2786，2833，2886，2919，2960，2995，3032，3073，3126，3157，3194，3239，3274，3313，3354，3409，3458，3483，3516 , 3539, 3590, 3629, 3660, 3707, 3750, 3777, 3822, 3861, 3898, 3939, 3982], dtype=int32))

如果我没有提供足够的细节，请道歉 - 否则请告诉我。

谢谢！

注意：编辑以简化随机森林（限制深度和 n_trees）

Answer 1

小智 1

如果您想可视化森林中的树木，您可以尝试此处提供的答案： https: //stats.stackexchange.com/q/118016

适应您的问题：

from sklearn import tree

...

i_tree = 0
for tree_in_forest in estimator.estimators_:
    with open('tree_' + str(i_tree) + '.dot', 'w') as my_file:
        my_file = tree.export_graphviz(tree_in_forest, out_file = my_file)
    i_tree = i_tree + 1

Run Code Online (Sandbox Code Playgroud)

这将创建 10 个（森林中树木的默认数量）名为 tree_i.dot 的文件（i = 0 到 9）。您可以在终端上为每个文件创建 pdf 文件（例如）：

$ dot -Tpdf tree_0.dot -o tree.pdf

Run Code Online (Sandbox Code Playgroud)

可能有一种更聪明的方法来做到这一点，如果有人可以提供帮助，我很乐意学习它:)

谢谢卢西亚娜。虽然这无疑很有用，但我们仍然面临可视化整个森林（而不是单个树）的决策路径的问题。 (3认同)

归档时间：	8 年，9 月前
查看次数：	2345 次
最近记录：	8 年，9 月前