R中的滚动条件计数

jim*_*y87 5 r dplyr

我想创建一个滚动函数,有条件地计算上一行中两列的出现。

例如,我有一个数据集,如下所示。

# Generate data
set.seed(123)
test <- data.frame(
  Round = rep(1:5, times = 3),
  Team = rep(c("Team 1", "Team 2", "Team 3"), each = 5),
  Venue = sample(sample(c("Venue A", "Venue B"), 15, replace = T))
)

   Round   Team   Venue
1      1 Team 1 Venue B
2      2 Team 1 Venue B
3      3 Team 1 Venue A
4      4 Team 1 Venue A
5      5 Team 1 Venue B
6      1 Team 2 Venue B
7      2 Team 2 Venue B
8      3 Team 2 Venue A
9      4 Team 2 Venue A
10     5 Team 2 Venue A
11     1 Team 3 Venue B
12     2 Team 3 Venue A
13     3 Team 3 Venue B
14     4 Team 3 Venue B
15     5 Team 3 Venue B
Run Code Online (Sandbox Code Playgroud)

我想要一个新的列,该列为每行显示过去3轮中该行中的团队在该行场地中的比赛次数。

我可以使用for循环很容易地做到这一点。

window <- 3

for (i in 1:nrow(dat)){
  # Create index to search (if i is less than window, start at 1)
  index <- max(i - window, 1):i

  # Search when current row matches both team and venue
  dat$VenueCount[i] <- sum(dat$Team[i] == dat$Team[index] & dat$Venue[i] == dat$Venue[index])
}

   Round   Team   Venue VenueCount
1      1 Team 1 Venue B          1
2      2 Team 1 Venue B          2
3      3 Team 1 Venue A          1
4      4 Team 1 Venue A          2
5      5 Team 1 Venue B          2
6      1 Team 2 Venue B          1
7      2 Team 2 Venue B          2
8      3 Team 2 Venue A          1
9      4 Team 2 Venue A          2
10     5 Team 2 Venue A          3
11     1 Team 3 Venue B          1
12     2 Team 3 Venue A          1
13     3 Team 3 Venue B          2
14     4 Team 3 Venue B          3
15     5 Team 3 Venue B          3
Run Code Online (Sandbox Code Playgroud)

但是,我想避免for循环(主要是因为我的实际数据集相对较大,大约有3万行)。我想它应该是可行的与一个zoodplyrpurrrapply但一直没能解决它。

谢谢

jim*_*y87 2

rollify我实际上使用的包中tibbletime找到了答案dplyr::mutate。将在此发布,但仍欢迎其他回复!

library(dplyr)
library(tibbletime)

# Create data
set.seed(123)
test <- data.frame(
  Round = rep(1:5, times = 3),
  Team = rep(c("Team 1", "Team 2", "Team 3"), each = 5),
  Venue = sample(sample(c("Venue A", "Venue B"), 15, replace = T))
)
Run Code Online (Sandbox Code Playgroud)

用于rollify创建自定义函数。

last_n_games = 3
count_games <- rollify(function(x) sum(last(x) == x), window = last_n_games)
Run Code Online (Sandbox Code Playgroud)

现在使用 mutate 来运行该函数。这将返回前 2 行 NA(即last_n_games - 1)。然后我可以使用group_byrow_number来计算这些第一次出现的次数

test <- test %>%
  group_by(Team) %>%
  mutate(VenueCount = count_games(Venue)) %>%
  group_by(Team, Venue) %>%
  mutate(VenueCount = ifelse(is.na(VenueCount), row_number(Team), VenueCount))
Run Code Online (Sandbox Code Playgroud)

这将返回以下内容

# A tibble: 15 x 4
# Groups:   Team, Venue [6]
   Round Team   Venue   VenueCount
   <int> <fct>  <fct>        <int>
 1     1 Team 1 Venue B          1
 2     2 Team 1 Venue B          2
 3     3 Team 1 Venue A          1
 4     4 Team 1 Venue A          2
 5     5 Team 1 Venue B          1
 6     1 Team 2 Venue B          1
 7     2 Team 2 Venue B          2
 8     3 Team 2 Venue A          1
 9     4 Team 2 Venue A          2
10     5 Team 2 Venue A          3
11     1 Team 3 Venue B          1
12     2 Team 3 Venue A          1
13     3 Team 3 Venue B          2
14     4 Team 3 Venue B          2
15     5 Team 3 Venue B          3
Run Code Online (Sandbox Code Playgroud)