我一直在用 tidymodels 为 Animal Crossing 用户评论 ( https://www.youtube.com/watch?v=whE85O1XCkg&t=1300s )从他的 Youtube 情感分析视频中复制 Julia Silge 的代码。在第 25 分钟,她使用 tune_grid(),当我尝试在我的脚本中使用它时,出现以下警告/错误:警告消息:所有模型在 tune_grid() 中失败。见.notes专栏。
在 .notes 中,出现 25 次:
[[1]]
# A tibble: 1 x 1
.notes
<chr>
1 "recipe: Error in UseMethod(\"prep\"): no applicable method for 'prep' applied~
Run Code Online (Sandbox Code Playgroud)
我怎样才能解决这个问题?我使用的代码与 Julia 使用的代码相同。我的整个代码是这样的:
library(tidyverse)
user_reviews <- read_tsv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-05/user_reviews.tsv")
Run Code Online (Sandbox Code Playgroud)
user_reviews %>%
count(grade) %>%
ggplot(aes(grade,n)) +
geom_col()
Run Code Online (Sandbox Code Playgroud)
user_reviews %>%
filter(grade > 0) %>%
sample_n(5) %>%
pull(text)
Run Code Online (Sandbox Code Playgroud)
reviews_parsed <- user_reviews %>%
mutate(text = str_remove(text, "Expand"),
rating = case_when(grade …Run Code Online (Sandbox Code Playgroud) r sentiment-analysis hyperparameters data-science tidymodels
我使用tidymodels估计了glmnet逻辑回归。但我无法弄清楚tidymodels中密切相关的两件事:
以下是伪模型的代码。我尝试过tidy(),coef()但predict()都失败了。任何帮助都感激不尽。谢谢。
library(tidymodels)
#> -- Attaching packages --------------------------------------------------------------------------------------------------------------------------- tidymodels 0.1.0 --
#> v broom 0.7.0 v recipes 0.1.13
#> v dials 0.0.8 v rsample 0.0.7
#> v dplyr 1.0.0 v tibble 3.0.3
#> v ggplot2 3.3.2 v tune 0.1.1
#> v infer 0.5.2 v workflows 0.1.2
#> v parsnip 0.1.2 v yardstick 0.0.7
#> v purrr 0.3.4
#> -- Conflicts ------------------------------------------------------------------------------------------------------------------------------ tidymodels_conflicts() --
#> x …Run Code Online (Sandbox Code Playgroud) 我\xe2\x80\x99m 尝试从 Scikit-Learn 跳转到 Tidymodels,由于 Julia Silge 和 Andrew Couch 的教程,大多数时候相对轻松。然而,现在我\xe2\x80\x99m 卡住了。通常我会使用initial_split(df, strata = x)来获取要使用的分割对象。但这次我\xe2\x80\x99 得到了来自不同部门的测试和训练集,我\xe2\x80\x99 担心这可能会成为常态。如果没有像last_fit()和collect_predictions()这样的分割对象函数,则\xe2\x80\x99无法工作。
\n如何对提供的数据集进行逆向工程,使它们成为 rsplit 对象?或者,是否可以先将数据集绑定在一起,然后准确地告诉initial_split() 哪些行应该进行训练和测试?
\n我看到有人在https://community.rstudio.com/t/tidymodels-creating-a-split-object-from-testing-and-training-data-perform-last-fit/69885提出了同样的问题。Max Kuhn 说你可以对 rsplit 对象进行逆向工程,但我不\xe2\x80\x99 不明白如何操作。\n谢谢!
\n# Example data\ntrain <- tibble(predictor = c(0, 1, 1, 1, 0, 1, 0, 0),\n feature_1 = c(12, 18, 15, 5, 20, 2, 6, 10),\n feature_2 = c(120, 98, 111, 67, 335, 123, 22, 69))\n\ntest <- tibble(predictor = c(0, 1, 0, 1),\n feature_1 = c(5, 13, 8, 9),\n feature_2 …Run Code Online (Sandbox Code Playgroud) 我想评估同一数据集上多个(主要是)线性回归模型的性能。我想也许使用tidymodels包和workflowsets::workflow_set()可能会起作用。我按照此处的示例进行操作,但我无法弄清楚如何从代码中实际获得拟合结果。
# Load packages
library("tidyverse")
library('workflowsets')
library("parsnip")
library("recipes")
# Data
dat <-
structure(list(q = c(66.65, 75.58, 83.06, 91.28, 119.26, 133.14,
146.32, 153.39, 168.57, 182.36, 210.09, 188.19, 213.42, 296.95,
326.33, 358.63, 475.99, 475.99, 683.44, 683.44, 838.49, 1282.1,
1648.97, 1572.97, 2055.14, 2521.39, 2685.11, 2859.46, 3242.87,
6899.19, 6377.42, 7581.96, 9599.32), c = c(317.06, 283.99, 279.56,
283.99, 227.84, 227.84, 262.5, 242.64, 270.9, 266.67, 210.6,
235.12, 235.12, 210.6, 207.31, 227.84, 220.78, 194.67, 177.13,
207.31, 179.94, 177.13, 182.79, 139.89, 148.98, …Run Code Online (Sandbox Code Playgroud) 我有以下使用 tidymodels\' agua包的脚本:
\nlibrary(tidymodels)\nlibrary(agua)\nlibrary(ggplot2)\ntheme_set(theme_bw())\nh2o_start()\n\ndata(concrete)\nset.seed(4595)\nconcrete_split <- initial_split(concrete, strata = compressive_strength)\nconcrete_train <- training(concrete_split)\nconcrete_test <- testing(concrete_split)\n\n# run for a maximum of 120 seconds\nauto_spec <-\n auto_ml() %>%\n set_engine("h2o", max_runtime_secs = 120, seed = 1) %>%\n set_mode("regression")\n\nnormalized_rec <-\n recipe(compressive_strength ~ ., data = concrete_train) %>%\n step_normalize(all_predictors())\n\nauto_wflow <-\n workflow() %>%\n add_model(auto_spec) %>%\n add_recipe(normalized_rec)\n\nauto_fit <- fit(auto_wflow, data = concrete_train)\nsaveRDS(auto_fit, file = "test.h2o.auto_fit.rds") #save the object\nh2o_end()\nRun Code Online (Sandbox Code Playgroud)\n在那里,我尝试将auto_fit对象保存到文件中。\n但是当我尝试检索它并使用它来预测测试数据时:
h2o_start()\nauto_fit <- readRDS("test.h2o.auto_fit.rds")\npredict(auto_fit, new_data = concrete_test)\nRun Code Online (Sandbox Code Playgroud)\n我收到一个错误:
\nError in `h2o_get_model()`:\n! Model id …Run Code Online (Sandbox Code Playgroud) 我正在尝试使用tidymodels stacks包来执行集成建模。按照他们的文章中提供的说明,我能够成功地重现该示例。
\n但是,当我在代码的“knn_res”部分的超参数调整期间添加并行化时:
\nlibrary(doParallel)\nlibrary(parallel)\nset.seed(2020)\ncls <- makePSOCKcluster(parallelly::availableCores())\nregisterDoParallel(cls)\nknn_res <- \n tune_grid(\n knn_wflow,\n resamples = folds,\n metrics = metric,\n grid = 4,\n control = ctrl_grid\n )\nstopCluster(cls)\n\nRun Code Online (Sandbox Code Playgroud)\n我在运行代码的“tree_frogs_model_st”部分时遇到错误:
\ntree_frogs_model_st <-\n tree_frogs_data_st %>%\n blend_predictions()\nRun Code Online (Sandbox Code Playgroud)\n错误消息指出:
\nError in summary.connection(connection) : invalid connection\nRun Code Online (Sandbox Code Playgroud)\n我相信这个问题可能与stacks::control_stack_grid()函数有关,但我不确定如何解决它。请指教。
\n更新(完整代表)
\n为了简洁起见,我排除了线性模型。
\nlibrary(doParallel)\nlibrary(parallel)\nset.seed(2020)\ncls <- makePSOCKcluster(parallelly::availableCores())\nregisterDoParallel(cls)\nknn_res <- \n tune_grid(\n knn_wflow,\n resamples = folds,\n metrics = metric,\n grid = 4,\n control = ctrl_grid\n )\nstopCluster(cls)\n\nRun Code Online (Sandbox Code Playgroud)\n由reprex 包于 2023 …
有没有办法在 tidy 模型中获得逻辑回归的标准误差和 p 值?
我可以通过下面的代码获得系数..但我想计算每个特征的优势比,我还需要标准误差..
glm.fit <-
logistic_reg(mode = "classification") %>%
set_engine(engine = "glm") %>%
fit(Species ~ ., data = iris)
glm.fit$fit$coefficients
Run Code Online (Sandbox Code Playgroud)
通常你可以通过调用summary()glm 对象来做到这一点,但我在这里尝试使用 tidymodels。
尝试运行我的第一个 LASSO 模型并遇到了一些问题。我有一个医学数据集,试图从大约 60 个预测变量中预测二分结果(疾病)。在出现错误“为该步骤选择的所有列都应该是数字”之前,我已经调整了网格,尽管在配方阶段已经将它们全部转换为虚拟变量。我减少了预测变量的数量,看看这是否会改变任何东西,但似乎并没有解决它。这种结果并不常见,大约有 3% 的病例出现,所以我不知道这是否会影响任何事情。代码如下
分为测试和训练数据并按疾病分层
set.seed(123)
df_split <- initial_split(df, strata = disease)
df_train <- training(df_split)
df_test <- testing(df_split)
Run Code Online (Sandbox Code Playgroud)
创建验证集
set.seed(234)
validation_set <- validation_split(df_train,
strata = dfPyVAN,
prop = 0.8)
Run Code Online (Sandbox Code Playgroud)
构建模型
df_model <-
logistic_reg(penalty = tune(), mixture = 1) %>%
set_engine("glmnet")
Run Code Online (Sandbox Code Playgroud)
创建食谱
df_recipe <-
recipe(dfPyVAN ~ ., data = df_train) %>%
step_medianimpute(all_predictors()) %>%
step_dummy(all_nominal(), -all_outcomes()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_predictors())
Run Code Online (Sandbox Code Playgroud)
创建工作流程
df_workflow <-
workflow() %>%
add_model(df_model) %>%
add_recipe(df_recipe)
Run Code Online (Sandbox Code Playgroud)
要调整的惩罚值网格
df_reg_grid <- tibble(penalty = 10^seq(-4, -1, length.out = 30))
Run Code Online (Sandbox Code Playgroud)
训练和调整模型 …
我有一个小标题,我正在尝试计算多个指标。
library(tidymodels)
price = 1:50
prediction = price * 0.9
My_tibble = tibble(price=price, prediction=prediction)
# The following code can calculate the rmse
My_tibble %>%
rmse(truth = price, estimate = prediction)
# Is it possible to calculate `rmse` and `rsq` at the same time?
# The following code reports an error: object '.pred' not found
My_tibble %>%
rmse(truth = price, estimate = prediction ) %>%
rsq(truth = price, estimate = prediction )
Run Code Online (Sandbox Code Playgroud)
把问题延伸一点,是否可以同时计算rmse和?cor
My_tibble %>%
rmse(truth = price, …Run Code Online (Sandbox Code Playgroud) 我有几条 tidymodels /parsnip 模型性能的 ROC 曲线,我想在一个图中相互展示以进行视觉比较:
roc1 <- structure(list(.threshold = c(-Inf, 0.188422381048697, 0.23446542423272,
0.241282102642437, 0.259726705912688, 0.29097010004365, 0.309897370938121,
0.33607659920306, 0.348797482584728, 0.371543061749991, 0.37849110465008,
0.403024193339376, 0.408074451522232, 0.425203432699806, 0.43288528993523,
0.437168077386449, 0.441435377101706, 0.454812465942723, 0.46890082819098,
0.469324015885685, 0.471191285258535, 0.473285736958109, 0.484067175067965,
0.501634453233048, 0.502895404815678, 0.505260074955513, 0.509400496728661,
0.512826032440735, 0.514474796037162, 0.520894854910534, 0.52482313756493,
0.544137627333669, 0.546168394598085, 0.555557692971751, 0.562118235565918,
0.564565992908277, 0.572138872116962, 0.5792082477202, 0.611888118194463,
0.621908020887883, 0.623655143605973, 0.629887735979754, 0.632025630132792,
0.636193886667259, 0.638203230744601, 0.646775289308722, 0.655148011873394,
0.658581199234482, 0.658707835285112, 0.66292920495746, 0.6753497980617,
0.691520083977918, 0.702288194696498, 0.704440842146043, 0.724494989785773,
0.735933141947951, 0.756427437462373, 0.785412673453098, 0.831367501773009,
0.831554130258554, 0.840204698487284, 0.845340108802608, 0.876022993703215,
Inf), specificity = c(0, 0, 0.032258064516129, 0.0645161290322581, …Run Code Online (Sandbox Code Playgroud)