我有一个足球队数据集,如下所示:
Home_team Away_team Home_score Away_score
Arsenal Chelsea 1 3
Manchester U Blackburn 2 9
Liverpool Leeds 0 8
Chelsea Arsenal 4 1
Run Code Online (Sandbox Code Playgroud)
我想对所涉及的球队进行分组,无论哪支球队在主场和客场比赛。例如,如果切尔西对阵阿森纳,无论比赛是在切尔西还是在阿森纳,我都希望新列“teams_involved”是阿森纳 - 切尔西。我的猜测是这样做的方法是按字母顺序将这些团队添加到新列中,但我不知道该怎么做。
期望的输出:
Home_team Away_team Home_score Away_score teams_involved
Arsenal Chelsea 1 3 Arsenal - Chelsea
Manchester U Blackburn 2 9 Blackburn - Manchester U
Liverpool Leeds 0 8 Leeds - Liverpool
Chelsea Arsenal 4 1 Arsenal - Chelsea
Run Code Online (Sandbox Code Playgroud)
我之所以要这样做,是因为我可以看到每支球队对特定球队的胜利次数,无论比赛地点如何。
df = read.table(text = "
Home_team Away_team Home_score Away_score
Arsenal Chelsea 1 3
ManchesterU Blackburn 2 9
Liverpool Leeds 0 8
Chelsea Arsenal 4 1
", header=T, stringsAsFactors=F)
library(dplyr)
df %>%
rowwise() %>% # for each row
mutate(Teams = paste(sort(c(Home_team, Away_team)), collapse = " - ")) %>% # sort the teams alphabetically and then combine them separating with -
ungroup() # forget the row grouping
# # A tibble: 4 x 5
# Home_team Away_team Home_score Away_score Teams
# <chr> <chr> <int> <int> <chr>
# 1 Arsenal Chelsea 1 3 Arsenal - Chelsea
# 2 ManchesterU Blackburn 2 9 Blackburn - ManchesterU
# 3 Liverpool Leeds 0 8 Leeds - Liverpool
# 4 Chelsea Arsenal 4 1 Arsenal - Chelsea
Run Code Online (Sandbox Code Playgroud)
没有的替代解决方案rowwise:
# create function and vectorize it
f = function(x,y) {paste(sort(c(x, y)), collapse = " - ")}
f = Vectorize(f)
# apply function to your dataset
df %>% mutate(Teams = f(Home_team, Away_team))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
596 次 |
| 最近记录: |