我需要按列字符串前缀分组更长的时间。下面的玩具示例有两个组“A”和“B”,但我需要一个针对任意数量的前缀组的通用 tidyverse 解决方案。
#toy df
set.seed(1)
df <- data.table(
date = rep(seq(as.Date("2020-01-01"),as.Date("2020-01-05"),by="day"),each=6),
k = rep(c("A.mean","A.median","A.min","B.mean","B.median","B.min"),5),
v = runif(30,0,50)
) %>%
pivot_wider(names_from = k, values_from = v)
df %>% head
date A.mean A.median A.min B.mean B.median B.min
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2020-01-01 13.3 18.6 28.6 45.4 10.1 44.9
2 2020-01-02 47.2 33.0 31.5 3.09 10.3 8.83
3 2020-01-03 34.4 19.2 38.5 24.9 35.9 49.6
4 2020-01-04 19.0 38.9 46.7 10.6 32.6 6.28
5 2020-01-05 13.4 19.3 0.670 19.1 43.5 17.0
#pivot longer by group prefix
df %>%
select(date,matches("A\\.")) %>%
rename_with(~str_replace(.x,"A\\.","")) %>%
mutate( k = "A") %>%
bind_rows(
df %>%
select(date,matches("B\\.")) %>%
rename_with(~str_replace(.x,"B\\.","")) %>%
mutate( k = "B")
)
date mean median min k
<date> <dbl> <dbl> <dbl> <chr>
1 2020-01-01 13.3 18.6 28.6 A
2 2020-01-02 47.2 33.0 31.5 A
3 2020-01-03 34.4 19.2 38.5 A
4 2020-01-04 19.0 38.9 46.7 A
5 2020-01-05 13.4 19.3 0.670 A
6 2020-01-01 45.4 10.1 44.9 B
7 2020-01-02 3.09 10.3 8.83 B
8 2020-01-03 24.9 35.9 49.6 B
9 2020-01-04 10.6 32.6 6.28 B
10 2020-01-05 19.1 43.5 17.0 B
Run Code Online (Sandbox Code Playgroud)
这是一个两步过程(出于演示目的,以两行显示)。首先旋转更长的时间来创建 k、统计名称和值的列,然后旋转更宽的位置来创建所需的结果。
下面编辑的代码,通过在“names_to”说明符中使用“.value”通配符选项,只需一步即可获得答案。
library(tidyr)
set.seed(1)
df <- data.frame(
date = rep(seq(as.Date("2020-01-01"),as.Date("2020-01-05"),by="day"),each=6),
k = rep(c("A.mean","A.median","A.min","B.mean","B.median","B.min"),5),
v = runif(30,0,50)
) %>%
pivot_wider(names_from = k, values_from = v)
#temp <- pivot_longer(df, -date, names_sep = "\\.", names_to = c("k", "stat"))
#answer <- pivot_wider(temp, id_cols = c("date", "k"), names_from= "stat", values_from="value")
#updated answer simplified down to just the pivot longer function
answer <- pivot_longer(df, -date, names_sep = "\\.", names_to = c("k", ".value"))
print(head(answer))
# A tibble: 6 x 5
date k mean median min
<date> <chr> <dbl> <dbl> <dbl>
1 2020-01-01 A 13.3 18.6 28.6
2 2020-01-01 B 45.4 10.1 44.9
3 2020-01-02 A 47.2 33.0 31.5
4 2020-01-02 B 3.09 10.3 8.83
5 2020-01-03 A 34.4 19.2 38.5
6 2020-01-03 B 24.9 35.9 49.6
Run Code Online (Sandbox Code Playgroud)