Iva*_*van 3 python apache-spark pyspark
我DecisionTreeClassifier使用这样的管道训练了一个模型:
from pyspark.ml import Pipeline
from pyspark.ml.feature import VectorAssembler, StringIndexer
from pyspark.ml.classification import DecisionTreeClassifier
cl = DecisionTreeClassifier(labelCol='target_idx', featuresCol='features')
pipe = Pipeline(stages=[target_index, assembler, cl])
model = pipe.fit(df_train)
# Prediction and model evaluation
predictions = model.transform(df_test)
Run Code Online (Sandbox Code Playgroud)
阶段是StringIndexer和的实例VectorAssembler。我现在可以评估模型的准确性,例如
mc_evaluator = MulticlassClassificationEvaluator(
labelCol="target_idx", predictionCol="prediction", metricName="precision" )
accuracy = mc_evaluator.evaluate(predictions)
print("Test Error = {}".format(1.0 - accuracy))
Run Code Online (Sandbox Code Playgroud)
大。现在,我需要检查树模型的结构。文档将我指向名为的属性toDebugString,但是ML DecisionTreeClassifier没有这个属性-它似乎仅是MLLib DecisionTree分类器的属性。如何从ML版本的管道内部的模型中获取树结构并将其绘制?
这在pyspark中为我工作:
model.stages[2]._call_java('toDebugString')
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
875 次 |
| 最近记录: |