pss*_*guy 3 r data-manipulation
我有一个足球队信息的数据帧df游戏(MATCHID)与这些初始值
TEAMID Venue LEAGUEPOS MATCHID
WHU A 5 1
COV H 12 1
EVE H 15 2
MNU A 2 2
ARS A 3 3
LEI H 4 3
Run Code Online (Sandbox Code Playgroud)
我希望为每个游戏创建一行,以便最终看起来像
MATCHID HomeTeam AwayTeam HomePos AwayPos
1 COV WHU 12 5 etc.
Run Code Online (Sandbox Code Playgroud)
所以我想创建一些新列,删除其他列并删除重复的行.
我在第一阶段的尝试遇到了麻烦
df $ HomeTeam < - df $ TEAMID [df $ Venue =="H"]
因为这会产生
TEAMID Venue LEAGUEPOS MATCHID HomeTeam
WHU A 5 1 COV
COV H 12 1 EVE
EVE H 15 2 LEI
MNU A 2 2 STH
ARS A 3 3 TOT
LEI H 4 3 WIM
Run Code Online (Sandbox Code Playgroud)
HomeTeam只显示Venue = H的每条记录的顺序TEAMID
这可以使用作为reshape基础R的一部分的功能容易地实现.
# READ DATA
mydf = read.table(textConnection("
TEAMID Venue LEAGUEPOS MATCHID
WHU A 5 1
COV H 12 1
EVE H 15 2
MNU A 2 2
ARS A 3 3
LEI H 4 3"),
sep = "", header = T, colClasses = rep('character', 4))
# RESHAPE DATA
reshape(mydf, idvar = 'MATCHID', timevar = 'Venue', direction = 'wide')
Run Code Online (Sandbox Code Playgroud)
这是产生的输出
MATCHID TEAMID.A LEAGUEPOS.A TEAMID.H LEAGUEPOS.H
1 1 WHU 5 COV 12
3 2 MNU 2 EVE 15
5 3 ARS 3 LEI 4
Run Code Online (Sandbox Code Playgroud)
注意:另一种方法是使用包中的cast和melt功能reshape.
require(reshape)
mydf_m = melt(mydf, id = c('MATCHID', 'Venue'))
cast(mydf_m, MATCHID ~ Venue + variable)
Run Code Online (Sandbox Code Playgroud)