viv*_*ian 6 python machine-learning decision-tree scikit-learn
I have made a decision tree using sklearn, here, under the SciKit learn DL package, viz. sklearn.tree.DecisionTreeClassifier().fit(x,y).
How do I get the gini indices for all possible nodes at each step? graphviz only gives me the gini index of the node with the lowest gini index, ie the node used for split.
For example, the image below (from graphviz) tells me the gini score of the Pclass_lowVMid right index which is 0.408, but not the gini index of the Pclass_lower or Sex_male at that step. I just know the Gini index of Pclass_lower and Sex_male must be greater than (0.408*0.7 + 0) but that's it.
使用export_graphviz显示所有节点的杂质,至少在版本中0.20.1。
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from graphviz import Source
data = load_iris()
X, y = data.data, data.target
clf = DecisionTreeClassifier(max_depth=2, random_state=42)
clf.fit(X, y)
graph = Source(export_graphviz(clf, out_file=None, feature_names=data.feature_names))
graph.format = 'png'
graph.render('dt', view=True);
Run Code Online (Sandbox Code Playgroud)
所有节点的杂质值也可以在impurity的属性中访问tree。
clf.tree_.impurity
array([0.66666667, 0. , 0.5 , 0.16803841, 0.04253308])
Run Code Online (Sandbox Code Playgroud)