我想过滤截止年之前最后一个测量年的所有观测值以及截止年之后所有年份的所有观测值。
这是一个例子:
d <- data.frame(group = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2),
cut_off = c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016),
year = c(2000,2010,2010,2015,2015,2017,2017,2020,2024,2001,2009,2016,2017,2017,2021,2023),
value = c(10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160))
> d
group cut_off year value
1 1 2017 2000 10
2 1 2017 2010 20
3 1 2017 2010 30
4 1 2017 2015 40
5 1 2017 2015 50
6 1 2017 2017 60
7 1 2017 2017 70
8 1 2017 2020 80
9 1 2017 2024 90
10 1 2016 2001 100
11 2 2016 2009 110
12 2 2016 2016 120
13 2 2016 2017 130
14 2 2016 2017 140
15 2 2016 2021 150
16 2 2016 2023 160
Run Code Online (Sandbox Code Playgroud)
这是我想要的输出:
desired <- data.frame(group = c(1,1,1,1,1,1,2,2,2,2,2,2),
cut_off = c(2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016),
year = c(2015,2015,2017,2017,2020,2024,2009,2016,2017,2017,2021,2023),
value = c(40,50,60,70,80,90,110,120,130,140,150,160))
> desired
group cut_off year value
1 1 2017 2015 40
2 1 2017 2015 50
3 1 2017 2017 60
4 1 2017 2017 70
5 1 2017 2020 80
6 1 2017 2024 90
7 2 2016 2009 110
8 2 2016 2016 120
9 2 2016 2017 130
10 2 2016 2017 140
11 2 2016 2021 150
12 2 2016 2023 160
Run Code Online (Sandbox Code Playgroud)
选择过去的所有年份(包括截止日期)很容易:
require(dplyr)
d %>%
+ filter(year >= cut_off)
group cut_off year value
1 1 2017 2017 60
2 1 2017 2017 70
3 1 2017 2020 80
4 1 2017 2024 90
5 2 2016 2016 120
6 2 2016 2017 130
7 2 2016 2017 140
8 2 2016 2021 150
9 2 2016 2023 160
Run Code Online (Sandbox Code Playgroud)
但我不知道如何获得截止年之前的最后一年。我尝试了使用lag()以及 with 和 without 的组合group_by(),但无法使其工作,例如(但不起作用)
d %>%
filter(year >= cut_off | lag(year) < cut_off)
Run Code Online (Sandbox Code Playgroud)
由于您使用的是之前的 最近一年cut_off,因此不需要两个过滤表达式。一旦找到最近一年,您希望所有行都大于或等于该值(将始终包括 cut_off):
library(dplyr)
d |>
mutate(diff = cut_off - year) |>
filter(year >= year[which.min(diff[diff > 0])], .by = group) |>
select(-diff)
Run Code Online (Sandbox Code Playgroud)
如果第 10 行是一个拼写错误(请参阅我的评论),那么您可以简单地执行以下操作(假设您的数据已按组/年份排序):
d |>
filter(year >= year[match(unique(cut_off), year) -1], .by = group)
Run Code Online (Sandbox Code Playgroud)
d <- data.frame(group = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2),
cut_off = c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016),
year = c(2000,2010,2010,2015,2015,2017,2017,2020,2024,2001,2009,2016,2017,2017,2021,2023),
value = c(10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160))
library(tidyverse)
d |>
mutate(year_prior_to_cutoff = year[year < cut_off] |> sort() |> last(), .by = group) |>
filter(year >= cut_off | year == year_prior_to_cutoff)
#> group cut_off year value year_prior_to_cutoff
#> 1 1 2017 2015 40 2015
#> 2 1 2017 2015 50 2015
#> 3 1 2017 2017 60 2015
#> 4 1 2017 2017 70 2015
#> 5 1 2017 2020 80 2015
#> 6 1 2017 2024 90 2015
#> 7 2 2016 2009 110 2009
#> 8 2 2016 2016 120 2009
#> 9 2 2016 2017 130 2009
#> 10 2 2016 2017 140 2009
#> 11 2 2016 2021 150 2009
#> 12 2 2016 2023 160 2009
Run Code Online (Sandbox Code Playgroud)
library(dplyr)
d %>%
mutate(helper = max(year[year < cut_off]), .by=group) %>%
filter(year >= cut_off | year == helper) %>%
select(-helper) %>%
arrange(group, year)
Run Code Online (Sandbox Code Playgroud)
group cut_off year value
1 1 2017 2015 40
2 1 2017 2015 50
3 1 2017 2017 60
4 1 2017 2017 70
5 1 2017 2020 80
6 1 2017 2024 90
7 2 2016 2009 110
8 2 2016 2016 120
9 2 2016 2017 130
10 2 2016 2017 140
11 2 2016 2021 150
12 2 2016 2023 160
Run Code Online (Sandbox Code Playgroud)