小编Jor*_*o82的帖子

tidymodels 工作流程的文件大小

我正在尝试将 tidymodels 采用到我的流程中，但我在保存工作流程方面遇到了挑战。工作流对象的文件大小比用于构建模型的数据大很多倍，因此在尝试将工作流应用于新数据时，我最终耗尽了内存。我无法判断这是否是正确的结果或者我是否遗漏了某些内容。

为了对新数据进行预测，我们是否只需要配方步骤、模型系数以及可能来自训练集的一些汇总数据（例如用于缩放目的的训练数据的标准差和平均值）？那么为什么工作流对象这么大呢？

这是使用数据集的简单示例iris。我尝试按照Julia 的示例进行操作，但工作流程最终仍然比数据本身大 24 倍。我知道 tidymodels 发展很快，所以也许现在有更好的方法？任何建议表示赞赏！

library(tidyverse)
library(tidymodels)
library(lobstr)
library(butcher)

set.seed(8675309)

#Create an indicator for whether the species is Setosa
df <- iris %>% 
    mutate(is_setosa = factor(Species == "setosa"))

#Split into train/test
df_split <- initial_split(df, prop = 0.80)
df_train <- training(df_split)
df_test <- testing(df_split)

#Create the workflow object
my_workflow <- workflow() %>% 
    #use a logistic regression model using glm
    add_model({
        logistic_reg() %>% 
            set_engine("glm")
    }) %>% 
    #Add the recipe
    add_recipe({
        recipe(is_setosa ~ Sepal.Length …

Run Code Online (Sandbox Code Playgroud)

r tidymodels

Jor*_*o82

lucky-day

7
推荐指数

1
解决办法

262
查看次数

ggraph 中的图例线粗细

使用时ggraph，有没有办法加粗边缘颜色的图例线？我试图覆盖但无济于事。这是一个例子：

library(tidyverse)
library(igraph)
library(ggraph)

set.seed(20190607)

#create dummy data
Nodes <- tibble(source = sample(letters, 8))
Edges <- Nodes %>% 
  mutate(target = source) %>% 
  expand.grid() %>% 
  #assign a random weight & color
  mutate(weight = runif(nrow(.)),
         color = sample(LETTERS[1:5], nrow(.), replace = TRUE)) %>% 
  #limit to a subset of all combinations
  filter(target != source,
         weight > 0.7)


#make the plot
Edges %>% 
  graph_from_data_frame(vertices = Nodes) %>% 
  ggraph(layout = "kk") + 
  #link width and color are dynamic
  geom_edge_link(alpha = 0.5, aes(width = …

Run Code Online (Sandbox Code Playgroud)

r ggraph

Jor*_*o82

lucky-day

5
推荐指数

1
解决办法

1485
查看次数