选择与某一行具有相同值的前 N ​​行和后 N 行

zaw*_*awa 12 r dataframe dplyr data.table

id我使用键和构建以下面板数据time

\n
pdata <- tibble(\n  id = rep(1:10, each = 5),\n  time = rep(2016:2020, times = 10),\n  value = c(c(1,1,1,0,0), c(1,1,0,0,0), c(0,0,1,0,0), c(0,0,0,0,0), c(1,0,0,0,1), c(0,1,1,1,0), c(0,1,1,1,1), c(1,1,1,1,1), c(1,0,1,1,1), c(1,1,0,1,1))\n)\npdata\n# A tibble: 50 \xc3\x97 3\n      id  time value\n   <int> <int> <dbl>\n 1     1  2016     1\n 2     1  2017     1\n 3     1  2018     1\n 4     1  2019     0\n 5     1  2020     0\n 6     2  2016     1\n 7     2  2017     1\n 8     2  2018     0\n 9     2  2019     0\n10     2  2020     0\n# \xe2\x80\xa6 with 40 more rows\n
Run Code Online (Sandbox Code Playgroud)\n

假设 2018 年发生了一次冲击。我希望对前 N 行和后 N 行进行切片,id这些行的值与冲击行的值相同。

\n

我举几个例子来说明。对于id == 5,数据集如下所示:

\n
pdata %>% filter(id == 5)\n# A tibble: 5 \xc3\x97 3\n     id  time value\n  <int> <int> <dbl>\n1     5  2016     1\n2     5  2017     0\n3     5  2018     0\n4     5  2019     0\n5     5  2020     1\n
Run Code Online (Sandbox Code Playgroud)\n

value2018 年的值为0 id == 5,我希望保留前 1 行和后1行(包括当前行),因为所有这些观测值都具有等于 0 的相同值:

\n
# A tibble: 3 \xc3\x97 3\n     id  time value\n  <int> <int> <dbl>\n1     5  2017     0\n2     5  2018     0\n3     5  2019     0\n
Run Code Online (Sandbox Code Playgroud)\n

对于id == 8,我希望得到:

\n
# A tibble: 5 \xc3\x97 3\n     id  time value\n  <int> <int> <dbl>\n1     8  2016     1\n2     8  2017     1\n3     8  2018     1\n4     8  2019     1\n5     8  2020     1\n
Run Code Online (Sandbox Code Playgroud)\n

对于id == 1,我希望获得空数据集,因为 2017 年的观测值和 2019 年的观测值对不具有相同的值。

\n

最终的数据集应该是:

\n
# A tibble: 19 \xc3\x97 3\n      id  time value\n   <int> <int> <dbl>\n 1     4  2016     0\n 2     4  2017     0\n 3     4  2018     0\n 4     4  2019     0\n 5     4  2020     0\n 6     5  2017     0\n 7     5  2018     0\n 8     5  2019     0\n 9     6  2017     1\n10     6  2018     1\n11     6  2019     1\n12     7  2017     1\n13     7  2018     1\n14     7  2019     1\n15     8  2016     1\n16     8  2017     1\n17     8  2018     1\n18     8  2019     1\n19     8  2020     1\n
Run Code Online (Sandbox Code Playgroud)\n

Jaa*_*aap 8

的解决方案:

# load the package & convert data to a data.table
library(data.table)
setDT(pdata)

# define shock-year and number of previous/next rows
shock <- 2018
n <- 2

# filter
pdata[, .SD[value == value[time == shock] &
              between(time, shock - n, shock + n) & 
              value == rev(value)][.N > 1 & all(diff(time) == 1)]
      , by = id]
Run Code Online (Sandbox Code Playgroud)

这使:

    id time value
 1:  4 2016     0
 2:  4 2017     0
 3:  4 2018     0
 4:  4 2019     0
 5:  4 2020     0
 6:  5 2017     0
 7:  5 2018     0
 8:  5 2019     0
 9:  6 2017     1
10:  6 2018     1
11:  6 2019     1
12:  7 2017     1
13:  7 2018     1
14:  7 2019     1
15:  8 2016     1
16:  8 2017     1
17:  8 2018     1
18:  8 2019     1
19:  8 2020     1
Run Code Online (Sandbox Code Playgroud)

使用数据:

pdata <- data.frame(
  id = rep(1:10, each = 5),
  time = rep(2016:2020, times = 10),
  value = c(c(1,1,1,0,0), c(1,1,0,0,0), c(0,0,1,0,0), c(0,0,0,0,0), c(1,0,0,0,1), c(0,1,1,1,0), c(0,1,1,1,1), c(1,1,1,1,1), c(1,0,1,1,1), c(1,1,0,1,1))
)
Run Code Online (Sandbox Code Playgroud)