基于现有列和分组在R数据帧中创建新列

pss*_*guy 3 r data-manipulation

我有一个足球队信息的数据帧df游戏(MATCHID)与这些初始值

 TEAMID Venue LEAGUEPOS MATCHID
 WHU     A         5       1
 COV     H        12       1
 EVE     H        15       2
 MNU     A         2       2
 ARS     A         3       3
 LEI     H         4       3
Run Code Online (Sandbox Code Playgroud)

我希望为每个游戏创建一行,以便最终看起来像

MATCHID HomeTeam AwayTeam HomePos AwayPos
   1       COV      WHU     12      5      etc.
Run Code Online (Sandbox Code Playgroud)

所以我想创建一些新列,删除其他列并删除重复的行.

我在第一阶段的尝试遇到了麻烦

df $ HomeTeam < - df $ TEAMID [df $ Venue =="H"]

因为这会产生

 TEAMID Venue LEAGUEPOS MATCHID HomeTeam
   WHU     A         5       1      COV
   COV     H        12       1      EVE
   EVE     H        15       2      LEI
   MNU     A         2       2      STH
   ARS     A         3       3      TOT
   LEI     H         4       3      WIM
Run Code Online (Sandbox Code Playgroud)

HomeTeam只显示Venue = H的每条记录的顺序TEAMID

Ram*_*ath 5

这可以使用作为reshape基础R的一部分的功能容易地实现.

# READ DATA
mydf = read.table(textConnection("
TEAMID Venue LEAGUEPOS MATCHID
 WHU     A         5       1
 COV     H        12       1
 EVE     H        15       2
 MNU     A         2       2
 ARS     A         3       3
 LEI     H         4       3"), 
 sep = "", header = T, colClasses = rep('character', 4))

# RESHAPE DATA
reshape(mydf, idvar = 'MATCHID', timevar = 'Venue', direction = 'wide')
Run Code Online (Sandbox Code Playgroud)

这是产生的输出

  MATCHID TEAMID.A LEAGUEPOS.A TEAMID.H LEAGUEPOS.H
1       1      WHU           5      COV          12
3       2      MNU           2      EVE          15
5       3      ARS           3      LEI           4
Run Code Online (Sandbox Code Playgroud)

注意:另一种方法是使用包中的castmelt功能reshape.

require(reshape)
mydf_m = melt(mydf, id = c('MATCHID', 'Venue'))
cast(mydf_m, MATCHID ~ Venue + variable)
Run Code Online (Sandbox Code Playgroud)