我想创建一个滚动函数,有条件地计算上一行中两列的出现。
例如,我有一个数据集,如下所示。
# Generate data
set.seed(123)
test <- data.frame(
Round = rep(1:5, times = 3),
Team = rep(c("Team 1", "Team 2", "Team 3"), each = 5),
Venue = sample(sample(c("Venue A", "Venue B"), 15, replace = T))
)
Round Team Venue
1 1 Team 1 Venue B
2 2 Team 1 Venue B
3 3 Team 1 Venue A
4 4 Team 1 Venue A
5 5 Team 1 Venue B
6 1 Team 2 Venue B
7 2 Team 2 Venue B
8 3 Team 2 Venue A
9 4 Team 2 Venue A
10 5 Team 2 Venue A
11 1 Team 3 Venue B
12 2 Team 3 Venue A
13 3 Team 3 Venue B
14 4 Team 3 Venue B
15 5 Team 3 Venue B
Run Code Online (Sandbox Code Playgroud)
我想要一个新的列,该列为每行显示过去3轮中该行中的团队在该行场地中的比赛次数。
我可以使用for循环很容易地做到这一点。
window <- 3
for (i in 1:nrow(dat)){
# Create index to search (if i is less than window, start at 1)
index <- max(i - window, 1):i
# Search when current row matches both team and venue
dat$VenueCount[i] <- sum(dat$Team[i] == dat$Team[index] & dat$Venue[i] == dat$Venue[index])
}
Round Team Venue VenueCount
1 1 Team 1 Venue B 1
2 2 Team 1 Venue B 2
3 3 Team 1 Venue A 1
4 4 Team 1 Venue A 2
5 5 Team 1 Venue B 2
6 1 Team 2 Venue B 1
7 2 Team 2 Venue B 2
8 3 Team 2 Venue A 1
9 4 Team 2 Venue A 2
10 5 Team 2 Venue A 3
11 1 Team 3 Venue B 1
12 2 Team 3 Venue A 1
13 3 Team 3 Venue B 2
14 4 Team 3 Venue B 3
15 5 Team 3 Venue B 3
Run Code Online (Sandbox Code Playgroud)
但是,我想避免for循环(主要是因为我的实际数据集相对较大,大约有3万行)。我想它应该是可行的与一个zoo,dplyr,purrr或apply但一直没能解决它。
谢谢
rollify我实际上使用的包中tibbletime找到了答案dplyr::mutate。将在此发布,但仍欢迎其他回复!
library(dplyr)
library(tibbletime)
# Create data
set.seed(123)
test <- data.frame(
Round = rep(1:5, times = 3),
Team = rep(c("Team 1", "Team 2", "Team 3"), each = 5),
Venue = sample(sample(c("Venue A", "Venue B"), 15, replace = T))
)
Run Code Online (Sandbox Code Playgroud)
用于rollify创建自定义函数。
last_n_games = 3
count_games <- rollify(function(x) sum(last(x) == x), window = last_n_games)
Run Code Online (Sandbox Code Playgroud)
现在使用 mutate 来运行该函数。这将返回前 2 行 NA(即last_n_games - 1)。然后我可以使用group_by和row_number来计算这些第一次出现的次数
test <- test %>%
group_by(Team) %>%
mutate(VenueCount = count_games(Venue)) %>%
group_by(Team, Venue) %>%
mutate(VenueCount = ifelse(is.na(VenueCount), row_number(Team), VenueCount))
Run Code Online (Sandbox Code Playgroud)
这将返回以下内容
# A tibble: 15 x 4
# Groups: Team, Venue [6]
Round Team Venue VenueCount
<int> <fct> <fct> <int>
1 1 Team 1 Venue B 1
2 2 Team 1 Venue B 2
3 3 Team 1 Venue A 1
4 4 Team 1 Venue A 2
5 5 Team 1 Venue B 1
6 1 Team 2 Venue B 1
7 2 Team 2 Venue B 2
8 3 Team 2 Venue A 1
9 4 Team 2 Venue A 2
10 5 Team 2 Venue A 3
11 1 Team 3 Venue B 1
12 2 Team 3 Venue A 1
13 3 Team 3 Venue B 2
14 4 Team 3 Venue B 2
15 5 Team 3 Venue B 3
Run Code Online (Sandbox Code Playgroud)