How do I get all Gini indices in my decision tree?

viv*_*ian 6 python machine-learning decision-tree scikit-learn

I have made a decision tree using sklearn, here, under the SciKit learn DL package, viz. sklearn.tree.DecisionTreeClassifier().fit(x,y).

How do I get the gini indices for all possible nodes at each step? graphviz only gives me the gini index of the node with the lowest gini index, ie the node used for split.

For example, the image below (from graphviz) tells me the gini score of the Pclass_lowVMid right index which is 0.408, but not the gini index of the Pclass_lower or Sex_male at that step. I just know the Gini index of Pclass_lower and Sex_male must be greater than (0.408*0.7 + 0) but that's it.

决策树

Kev*_*vin 5

使用export_graphviz显示所有节点的杂质,至少在版本中0.20.1

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from graphviz import Source

data = load_iris()
X, y = data.data, data.target

clf = DecisionTreeClassifier(max_depth=2, random_state=42)
clf.fit(X, y)

graph = Source(export_graphviz(clf, out_file=None, feature_names=data.feature_names))
graph.format = 'png'
graph.render('dt', view=True);
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

所有节点的杂质值也可以在impurity的属性中访问tree

clf.tree_.impurity
array([0.66666667, 0.        , 0.5       , 0.16803841, 0.04253308])
Run Code Online (Sandbox Code Playgroud)