如何将两个数据框合并在一起并满足一些条件要求?

Sta*_*taq 12 r dataframe

我有两个数据框,df1并且df2. 我想加入两者,但需满足以下条件:

  1. 将 df1 和 df2 合并到genderandTest
  2. TestDate需要df1在内部Date1Date2来自df2
  3. all.x = TRUE(保留df1记录)

我该如何处理第二部分?

在此输入图像描述

df1 <- structure(list(ID = c(1, 2, 3, 5, 4), Gender = c("F", "M", "M", 
"F", "F"), TestDate = structure(c(17897, 17898, 18630, 18262, 
17900), class = "Date"), Test = c("Weight", "Weight", "ELA", 
"ELA", "Math")), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))


df2 <- structure(list(Test = c("Weight", "Weight", "ELA", "ELA", "ELA", 
"ELA", "Math", "Math"), Gender = c("F", "M", "F", "M", "F", "M", 
"F", "M"), Date1 = structure(c(17532, 17534, 17536, 17537, 18266, 
18267, 17897, 17539), class = "Date"), Ave = c(97, 99, 85, 84, 
83, 82, 88, 89), Date2 = structure(c(18993, 18995, 18266, 18267, 
18997, 18998, 18999, 19000), class = "Date")), row.names = c(NA, 
-8L), class = c("tbl_df", "tbl", "data.frame"))
Run Code Online (Sandbox Code Playgroud)

Vvd*_*vdL 7

这对你有用吗?

library(dplyr)
library(data.table)
merge(x = df1, 
      y = df2) %>% 
  filter(TestDate %between% list(Date1, Date2))
Run Code Online (Sandbox Code Playgroud)


Rui*_*das 7

这是一个tidyverse解决方案。

library(tidyverse)

inner_join(df1, df2) %>%
  filter(TestDate >= Date1 & TestDate <= Date2)
#> Joining, by = c("Gender", "Test")
#> # A tibble: 5 x 7
#>      ID Gender TestDate   Test   Date1        Ave Date2
#>   <dbl> <chr>  <date>     <chr>  <date>     <dbl> <date>
#> 1     1 F      2019-01-01 Weight 2018-01-01    97 2022-01-01
#> 2     2 M      2019-01-02 Weight 2018-01-03    99 2022-01-03
#> 3     3 M      2021-01-03 ELA    2020-01-06    82 2022-01-06
#> 4     5 F      2020-01-01 ELA    2018-01-05    85 2020-01-05
#> 5     4 F      2019-01-04 Math   2019-01-01    88 2022-01-07
Run Code Online (Sandbox Code Playgroud)

由reprex 包于 2022 年 3 月 21 日创建。


akr*_*run 7

我们可以使用非等值连接

library(data.table)
setDT(df2)[df1, on = .(Gender, Test,  Date1 <= TestDate,  Date2 >= TestDate)]
Run Code Online (Sandbox Code Playgroud)

-输出

   Test Gender      Date1   Ave      Date2    ID
   <char> <char>     <Date> <num>     <Date> <num>
1: Weight      F 2019-01-01    97 2019-01-01     1
2: Weight      M 2019-01-02    99 2019-01-02     2
3:    ELA      M 2021-01-03    82 2021-01-03     3
4:    ELA      F 2020-01-01    85 2020-01-01     5
5:   Math      F 2019-01-04    88 2019-01-04     4
Run Code Online (Sandbox Code Playgroud)

  • @Stataq 谢谢。如果您检查“?data.table”,文档会说“+Inf(或TRUE)将x中的当前值向前滚动。它也称为最后一次观察结转 (LOCF)。 (2认同)
  • 非常感谢!一如既往的非常有帮助。`data.table` 提供了一些巧妙的方法来合并 df。很少使用这个,但会更多地尝试这个路线。:D (2认同)