Gee*_*eet 1 r dplyr tidyr purrr
这是我的玩具数据。
df <- tibble::tribble(
~date1, ~A Equity, ~date2, ~B Equity, ~date3, ~C Equity,
"1/29/2016", 35, "10/31/2017", 67, NA_character_, NA_real_,
"2/29/2016", 40, "11/30/2017", 31, NA_character_, NA_real_,
NA_character_,NA_real_, "12/29/2017", 56, NA_character_, NA_real_)
Run Code Online (Sandbox Code Playgroud)
真正的有 1000 多列和更多的日期。
我想加长数据,以便所需的输出只有日期、变量和值列,如下所示:
desired_df <- tibble::tribble(
~date, ~var, ~value,
"1/29/2016", "A", 35,
"2/29/2016", "A", 40,
"10/31/2017", "B", 67,
"11/30/2017", "B", 31,
"12/29/2017", "B", 56)
Run Code Online (Sandbox Code Playgroud)
我试过这个,但没有得到想要的结果:
df2 <- df %>%
pivot_longer(cols = contains("date"), names_to = "dates", values_to = "date") %>%
pivot_longer (cols = contains("Equity"), names_to = "var", values_to = "value") %>%
select(-dates) %>%
distinct() %>%
filter(!is.na(date))
Run Code Online (Sandbox Code Playgroud)
如果names_to
是包含特殊字符的字符.value
,则该values_to
值将被忽略,值列的名称将从现有列名称的一部分派生。
library(tidyverse)
## extract stock names
stock <- sub("_Equity", "", grep("Equity$", names(df), value = TRUE))
df %>%
rename_at(vars(starts_with("date")), ~ str_c(stock, "_date")) %>%
rename_at(vars(ends_with("Equity")), ~ str_c(stock, "_value")) %>%
pivot_longer(everything(),
names_to = c("var", ".value"),
names_sep = "_",
values_drop_na = TRUE)
# # A tibble: 5 x 3
# var date value
# <chr> <chr> <dbl>
# 1 A 1/29/2016 35
# 2 B 10/31/2017 67
# 3 A 2/29/2016 40
# 4 B 11/30/2017 31
# 5 B 12/29/2017 56
Run Code Online (Sandbox Code Playgroud)
数据
df <- structure(list(date1 = c("1/29/2016", "2/29/2016", NA), A_Equity = c(35,
40, NA), date2 = c("10/31/2017", "11/30/2017", "12/29/2017"),
B_Equity = c(67, 31, 56), date3 = c(NA_character_, NA_character_,
NA_character_), C_Equity = c(NA_real_, NA_real_, NA_real_
)), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
136 次 |
最近记录: |