从PySpark ML中的DecisionTreeClassifier获取toDebugString

Iva*_*van 3 python apache-spark pyspark

DecisionTreeClassifier使用这样的管道训练了一个模型:

from pyspark.ml import Pipeline
from pyspark.ml.feature import VectorAssembler, StringIndexer
from pyspark.ml.classification import DecisionTreeClassifier

cl = DecisionTreeClassifier(labelCol='target_idx', featuresCol='features')
pipe = Pipeline(stages=[target_index, assembler, cl])
model = pipe.fit(df_train)

# Prediction and model evaluation
predictions = model.transform(df_test)
Run Code Online (Sandbox Code Playgroud)

阶段是StringIndexer和的实例VectorAssembler。我现在可以评估模型的准确性,例如

mc_evaluator = MulticlassClassificationEvaluator(
labelCol="target_idx", predictionCol="prediction", metricName="precision"    )

accuracy = mc_evaluator.evaluate(predictions)
print("Test Error = {}".format(1.0 - accuracy))
Run Code Online (Sandbox Code Playgroud)

大。现在,我需要检查树模型的结构。文档将我指向名为的属性toDebugString,但是ML DecisionTreeClassifier没有这个属性-它似乎仅是MLLib DecisionTree分类器的属性。如何从ML版本的管道内部的模型中获取树结构并将其绘制?

Mir*_*abo 5

这在pyspark中为我工作:

model.stages[2]._call_java('toDebugString')
Run Code Online (Sandbox Code Playgroud)