有没有人有任何关于如何将树信息从sparklyr的ml_decision_tree_classifier,ml_gbt_classifier或ml_random_forest_classifier模型转换为.)格式的建议,这种格式可以被其他R树相关的库理解,并且(最终)b.)树的可视化用于非技术消费?这将包括从向量汇编器期间生成的替换字符串索引值转换回实际要素名称的能力.
为了提供一个例子,下面的代码从sparklyr博客文章中大量复制:
library(sparklyr)
library(dplyr)
# If needed, install Spark locally via `spark_install()`
sc <- spark_connect(master = "local")
iris_tbl <- copy_to(sc, iris)
# split the data into train and validation sets
iris_data <- iris_tbl %>%
sdf_partition(train = 2/3, validation = 1/3, seed = 123)
iris_pipeline <- ml_pipeline(sc) %>%
ft_dplyr_transformer(
iris_data$train %>%
mutate(Sepal_Length = log(Sepal_Length),
Sepal_Width = Sepal_Width ^ 2)
) %>%
ft_string_indexer("Species", "label")
iris_pipeline_model <- iris_pipeline %>%
ml_fit(iris_data$train)
iris_vector_assembler <- ft_vector_assembler(
sc,
input_cols = setdiff(colnames(iris_data$train), "Species"),
output_col = "features" …Run Code Online (Sandbox Code Playgroud)