zaw*_*awa 12 r dataframe dplyr data.table
id我使用键和构建以下面板数据time:
pdata <- tibble(\n id = rep(1:10, each = 5),\n time = rep(2016:2020, times = 10),\n value = c(c(1,1,1,0,0), c(1,1,0,0,0), c(0,0,1,0,0), c(0,0,0,0,0), c(1,0,0,0,1), c(0,1,1,1,0), c(0,1,1,1,1), c(1,1,1,1,1), c(1,0,1,1,1), c(1,1,0,1,1))\n)\npdata\n# A tibble: 50 \xc3\x97 3\n id time value\n <int> <int> <dbl>\n 1 1 2016 1\n 2 1 2017 1\n 3 1 2018 1\n 4 1 2019 0\n 5 1 2020 0\n 6 2 2016 1\n 7 2 2017 1\n 8 2 2018 0\n 9 2 2019 0\n10 2 2020 0\n# \xe2\x80\xa6 with 40 more rows\nRun Code Online (Sandbox Code Playgroud)\n假设 2018 年发生了一次冲击。我希望对前 N 行和后 N 行进行切片,id这些行的值与冲击行的值相同。
我举几个例子来说明。对于id == 5,数据集如下所示:
pdata %>% filter(id == 5)\n# A tibble: 5 \xc3\x97 3\n id time value\n <int> <int> <dbl>\n1 5 2016 1\n2 5 2017 0\n3 5 2018 0\n4 5 2019 0\n5 5 2020 1\nRun Code Online (Sandbox Code Playgroud)\nvalue2018 年的值为0 id == 5,我希望保留前 1 行和后1行(包括当前行),因为所有这些观测值都具有等于 0 的相同值:
# A tibble: 3 \xc3\x97 3\n id time value\n <int> <int> <dbl>\n1 5 2017 0\n2 5 2018 0\n3 5 2019 0\nRun Code Online (Sandbox Code Playgroud)\n对于id == 8,我希望得到:
# A tibble: 5 \xc3\x97 3\n id time value\n <int> <int> <dbl>\n1 8 2016 1\n2 8 2017 1\n3 8 2018 1\n4 8 2019 1\n5 8 2020 1\nRun Code Online (Sandbox Code Playgroud)\n对于id == 1,我希望获得空数据集,因为 2017 年的观测值和 2019 年的观测值对不具有相同的值。
最终的数据集应该是:
\n# A tibble: 19 \xc3\x97 3\n id time value\n <int> <int> <dbl>\n 1 4 2016 0\n 2 4 2017 0\n 3 4 2018 0\n 4 4 2019 0\n 5 4 2020 0\n 6 5 2017 0\n 7 5 2018 0\n 8 5 2019 0\n 9 6 2017 1\n10 6 2018 1\n11 6 2019 1\n12 7 2017 1\n13 7 2018 1\n14 7 2019 1\n15 8 2016 1\n16 8 2017 1\n17 8 2018 1\n18 8 2019 1\n19 8 2020 1\nRun Code Online (Sandbox Code Playgroud)\n
data.table的解决方案:
# load the package & convert data to a data.table
library(data.table)
setDT(pdata)
# define shock-year and number of previous/next rows
shock <- 2018
n <- 2
# filter
pdata[, .SD[value == value[time == shock] &
between(time, shock - n, shock + n) &
value == rev(value)][.N > 1 & all(diff(time) == 1)]
, by = id]
Run Code Online (Sandbox Code Playgroud)
这使:
Run Code Online (Sandbox Code Playgroud)id time value 1: 4 2016 0 2: 4 2017 0 3: 4 2018 0 4: 4 2019 0 5: 4 2020 0 6: 5 2017 0 7: 5 2018 0 8: 5 2019 0 9: 6 2017 1 10: 6 2018 1 11: 6 2019 1 12: 7 2017 1 13: 7 2018 1 14: 7 2019 1 15: 8 2016 1 16: 8 2017 1 17: 8 2018 1 18: 8 2019 1 19: 8 2020 1
使用数据:
pdata <- data.frame(
id = rep(1:10, each = 5),
time = rep(2016:2020, times = 10),
value = c(c(1,1,1,0,0), c(1,1,0,0,0), c(0,0,1,0,0), c(0,0,0,0,0), c(1,0,0,0,1), c(0,1,1,1,0), c(0,1,1,1,1), c(1,1,1,1,1), c(1,0,1,1,1), c(1,1,0,1,1))
)
Run Code Online (Sandbox Code Playgroud)