我有一个包含遵循名称模式的多列的数据集,并且我需要计算作为其他两列的乘积的新列。我正在寻找一个 tidyverse 选项,但我想避免做一个ivot_longer,因为数据集有>百万行。
示例数据集
library(dplyr)
df <- tibble(
jan_mean = runif(10),
feb_mean = runif(10),
mar_mean = runif(10),
jan_sd = runif(10),
feb_sd = runif(10),
mar_sd = runif(10),
)
Run Code Online (Sandbox Code Playgroud)
我可以像这样手动完成:
df2 <- df %>%
mutate(jan_cv= jan_mean/jan_sd,
feb_cv= feb_mean/feb_sd,
mar_cv= mar_mean/mar_sd
)
Run Code Online (Sandbox Code Playgroud)
这是一个简单的例子,但我对月值有类似的操作。
编辑1
我需要对大型数据集执行此操作,并且担心这pivot_longer会非常耗时,因此我对这三种方法进行了快速比较。
方法 1是手动方式,方法 2是 @Tarjae 建议的简短版本,方法 3使用更长的数据透视:
tic("Method 1: manual option")
df2 <- df %>%
mutate(jan_cv= jan_mean/jan_sd,
feb_cv= feb_mean/feb_sd,
mar_cv= mar_mean/mar_sd
)
toc()
tic("Method 2: Short option")
df2 <- df %>%
mutate(across(ends_with('_mean'), ~ . /
get(str_replace(cur_column(), "mean$", "sd")), .names = "{.col}_cv")) %>%
rename_at(vars(ends_with('cv')), ~ str_remove(., "\\_mean"))
toc()
tic("Method 3: pivot wider option")
df2 <- df %>%
mutate(id = row_number()) %>%
pivot_longer(-id, names_to = c("month", ".value"), names_sep = "_") %>%
mutate(cv = mean / sd) %>%
pivot_wider(names_from = "month", values_from = c(mean, sd, cv), names_glue = "{month}_{.value}") %>%
select(-id)
toc()
Run Code Online (Sandbox Code Playgroud)
结果是:
Method 1: manual option: 0.05 sec elapsed
Method 2: Short option: 0.01 sec elapsed
Method 3: pivot wider option: 0.19 sec elapsed
Run Code Online (Sandbox Code Playgroud)
所以方法 2 甚至比手动执行每一列更快
在这种情况下,我们可以使用across一些字符串操作stringr:
library(dplyr)
library(stringr)
df %>%
mutate(across(ends_with('_mean'), ~ . /
get(str_replace(cur_column(), "mean$", "sd")), .names = "{.col}_cv")) %>%
rename_at(vars(ends_with('cv')), ~ str_remove(., "\\_mean"))
Run Code Online (Sandbox Code Playgroud)
jan_mean feb_mean mar_mean jan_sd feb_sd mar_sd jan_cv feb_cv mar_cv
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.838 0.401 0.131 0.329 0.0292 0.911 2.55 13.7 0.144
2 0.595 0.173 0.0935 0.313 0.105 0.247 1.90 1.64 0.378
3 0.0546 0.934 0.983 0.536 0.618 0.292 0.102 1.51 3.36
4 0.543 0.802 0.569 0.585 0.901 0.742 0.928 0.891 0.766
5 0.899 0.761 0.245 0.932 0.506 0.526 0.965 1.50 0.466
6 0.832 0.875 0.947 0.390 0.613 0.607 2.13 1.43 1.56
7 0.268 0.421 0.930 0.869 0.873 0.612 0.308 0.483 1.52
8 0.475 0.217 0.330 0.0473 0.826 0.903 10.0 0.262 0.366
9 0.379 0.425 0.479 0.931 0.381 0.223 0.407 1.12 2.15
10 0.616 0.922 0.707 0.976 0.241 0.619 0.631 3.82 1.14
Run Code Online (Sandbox Code Playgroud)