DR1*_*R15 5 database select r filter dataframe
我正在努力研究如何仅使用基于日期的第一个正测试来创建数据帧的子样本。我将展示一个玩具示例。假设我有以下内容;
df = data.frame(guy = c("A", "B", "A", 'B', "C", "C"),
test1 = c(1, 1, 0, 0, 1, 0),
test2 = c(0, 1, 0, 1, 0, 0),
test3 = c(0, 0, 1, 0, 0, 1),
date = as.Date(c('1999-10-20', '1999-10-21', '1999-10-22', '1999-10-23', '1999-10-24', '1999-10-25')));df
#guy test1 test2 test3 date
#1 A 1 0 0 1999-10-20
#2 B 1 1 0 1999-10-21
#3 A 0 0 1 1999-10-22
#4 B 0 1 0 1999-10-23
#5 C 1 0 0 1999-10-24
#6 C 0 0 1 1999-10-25
Run Code Online (Sandbox Code Playgroud)
现在,我想过滤,只选择第一个正面测试,(即test1|test2|test3 = 1)基于最旧的date. 在我的示例中,我会得到以下信息:
#guy test1 test2 test3 date
#1 A 1 0 0 1999-10-20
#2 B 1 1 0 1999-10-21
#3 C 1 0 0 1999-10-24
Run Code Online (Sandbox Code Playgroud)
数据框:
df = data.frame(guy = c("A", "B", "A", 'B', "C", "C"),
test1 = c(1, 1, 0, 0, 1, 0),
test2 = c(0, 1, 0, 1, 0, 0),
test3 = c(0, 0, 1, 0, 0, 1),
date = as.Date(c('1999-10-20', '1999-10-21', '1999-10-22', '1999-10-23', '1999-10-24', '1999-10-25')));df
Run Code Online (Sandbox Code Playgroud)
关于我该怎么做的任何提示?
使用dplyr::top_n另一种选择是:
df = data.frame(guy = c("A", "B", "A", 'B', "C", "C"),
test1 = c(1, 1, 0, 0, 1, 0),
test2 = c(0, 1, 0, 1, 0, 0),
test3 = c(0, 0, 1, 0, 0, 1),
date = as.Date(c('1999-10-20', '1999-10-21', '1999-10-22', '1999-10-23', '1999-10-24', '1999-10-25')))
library(dplyr)
df %>%
filter(test1 | test2 | test3) %>%
group_by(guy) %>%
top_n(-1, date)
#> # A tibble: 3 x 5
#> # Groups: guy [3]
#> guy test1 test2 test3 date
#> <chr> <dbl> <dbl> <dbl> <date>
#> 1 A 1 0 0 1999-10-20
#> 2 B 1 1 0 1999-10-21
#> 3 C 1 0 0 1999-10-24
Run Code Online (Sandbox Code Playgroud)