我已经生成了一个点文件以使用代码可视化决策树
import numpy as np
from sklearn.model_selection import train_test_split
import sklearn.tree
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test =train_test_split(cancer.data,cancer.target, stratify=cancer.target, random_state=42)
tree = sklearn.tree.DecisionTreeClassifier(random_state=0,max_depth=4)
tree.fit(X_train,y_train)
sklearn.tree.export_graphviz(tree,out_file="tree.dot",class_names=cancer.target_names,feature_names=cancer.feature_names,impurity=False, filled=True)
Run Code Online (Sandbox Code Playgroud)
这将成功创建tree.dot文件。我现在可以使用graphviz的dot.exe实用工具生成一个png文件(https://graphviz.gitlab.io/_pages/Download/Download_windows.html)
from subprocess import check_call
check_call(['...PATH_TO_GRAPHVIZ/graphviz-2.38/release/bin/dot.exe','-Tpng','tree.dot','-o','tree.png'])
Run Code Online (Sandbox Code Playgroud)
我也想在PyCharm中可视化决策树。有没有办法做到这一点?
我正在学习 PySpark,能够快速创建示例数据帧来尝试 PySpark API 的功能很方便。
以下代码(其中spark是 Spark 会话):
import pyspark.sql.types as T
df = [{'id': 1, 'data': {'x': 'mplah', 'y': [10,20,30]}},
{'id': 2, 'data': {'x': 'mplah2', 'y': [100,200,300]}},
]
df = spark.createDataFrame(df)
df.printSchema()
Run Code Online (Sandbox Code Playgroud)
给出一个映射(并且不能正确解释数组):
root
|-- data: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
|-- id: long (nullable = true)
Run Code Online (Sandbox Code Playgroud)
我需要一个结构。如果我给出一个模式,我可以强制一个结构:
import pyspark.sql.types as T
df = [{'id': 1, 'data': {'x': 'mplah', 'y': [10,20,30]}},
{'id': 2, 'data': {'x': 'mplah2', 'y': …Run Code Online (Sandbox Code Playgroud)