如何使用不匹配的日期 pivot_long 对 date-var 组合?

Gee*_*eet 1 r dplyr tidyr purrr

这是我的玩具数据。

    df <- tibble::tribble(
    ~date1,      ~A Equity,  ~date2,          ~B Equity, ~date3,     ~C Equity,
    "1/29/2016",        35,  "10/31/2017",     67,       NA_character_,  NA_real_,
    "2/29/2016",        40,  "11/30/2017",     31,       NA_character_,  NA_real_,
    NA_character_,NA_real_,  "12/29/2017",     56,       NA_character_,  NA_real_)
Run Code Online (Sandbox Code Playgroud)

真正的有 1000 多列和更多的日期。

我想加长数据,以便所需的输出只有日期、变量和值列,如下所示:

desired_df <- tibble::tribble(
         ~date,   ~var,  ~value,
   "1/29/2016",  "A",      35,
   "2/29/2016",  "A",      40,
  "10/31/2017",  "B",      67,
  "11/30/2017",  "B",      31,
  "12/29/2017",  "B",      56)
Run Code Online (Sandbox Code Playgroud)

我试过这个,但没有得到想要的结果:

df2 <- df %>% 
  pivot_longer(cols = contains("date"), names_to = "dates", values_to = "date") %>% 
  pivot_longer (cols = contains("Equity"), names_to = "var", values_to = "value") %>% 
  select(-dates) %>% 
  distinct() %>% 
  filter(!is.na(date))
Run Code Online (Sandbox Code Playgroud)

Dar*_*sai 6

如果names_to是包含特殊字符的字符.value,则该values_to值将被忽略,值列的名称将从现有列名称的一部分派生。

library(tidyverse)

## extract stock names
stock <- sub("_Equity", "", grep("Equity$", names(df), value = TRUE))

df %>%
  rename_at(vars(starts_with("date")), ~ str_c(stock, "_date")) %>%
  rename_at(vars(ends_with("Equity")), ~ str_c(stock, "_value")) %>%
  pivot_longer(everything(),
               names_to = c("var", ".value"),
               names_sep = "_",
               values_drop_na = TRUE)

# # A tibble: 5 x 3
#   var   date       value
#   <chr> <chr>      <dbl>
# 1 A     1/29/2016     35
# 2 B     10/31/2017    67
# 3 A     2/29/2016     40
# 4 B     11/30/2017    31
# 5 B     12/29/2017    56
Run Code Online (Sandbox Code Playgroud)

数据

df <- structure(list(date1 = c("1/29/2016", "2/29/2016", NA), A_Equity = c(35, 
40, NA), date2 = c("10/31/2017", "11/30/2017", "12/29/2017"), 
    B_Equity = c(67, 31, 56), date3 = c(NA_character_, NA_character_, 
    NA_character_), C_Equity = c(NA_real_, NA_real_, NA_real_
    )), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))
Run Code Online (Sandbox Code Playgroud)