综合线性回归

han*_*sta 3 aggregate r linear-regression

对不起,我对R很新,但我有一个包含多个玩家游戏日志的数据框.我试图获得每个玩家在所有游戏中积分的斜率系数.我已经看到,aggregate可以使用运营商如sumaverage,并得到系数掀起了线性回归的非常简单为好.我如何结合这些?

a <- c("player1","player1","player1","player2","player2","player2")
b <- c(1,2,3,4,5,6)
c <- c(15,12,13,4,15,9)
gamelogs <- data.frame(name=a, game=b, pts=c)
Run Code Online (Sandbox Code Playgroud)

我希望这成为:

   name    pts slope
player1       -.4286
player2       .08242    
Run Code Online (Sandbox Code Playgroud)

the*_*ail 6

您也可以与基地做一些魔术lm,一次完成所有操作:

coef(lm(game ~ pts*name - pts, data=gamelogs))[3:4]
coef(lm(game ~ pts:name + name, data=gamelogs))[3:4]
#pts:nameplayer1 pts:nameplayer2 
#    -0.42857143      0.08241758 
Run Code Online (Sandbox Code Playgroud)

作为data.frame:

data.frame(slope=coef(lm(game ~ pts*name - pts, data=gamelogs))[3:4])
#                      slope
#pts:nameplayer1 -0.42857143
#pts:nameplayer2  0.08241758
Run Code Online (Sandbox Code Playgroud)

请参阅此处以获取有关lm调用中建模的进一步说明:

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/formula.html
http://faculty.chicagobooth.edu/richard.hahn/teaching/FormulaNotation.pdf#2

在这种情况下pts*name扩展到pts + name + pts:name当移除- pts意味着它相当于pts:name + name


Ric*_*ven 5

你可以做

s <- split(gamelogs, gamelogs$name)

vapply(s, function(x) lm(game ~ pts, x)[[1]][2], 1)
#     player1     player2 
# -0.42857143  0.08241758 
Run Code Online (Sandbox Code Playgroud)

或者

do.call(rbind, lapply(s, function(x) coef(lm(game ~ pts, x))[2]))
#                 pts
# player1 -0.42857143
# player2  0.08241758
Run Code Online (Sandbox Code Playgroud)

或者如果你想使用dplyr,你可以做

library(dplyr)

models <- group_by(gamelogs, name) %>% 
    do(mod = lm(game ~ pts, data = .))

cbind(
    name = models$name, 
    do(models, data.frame(slope = coef(.$mod)[2]))
)
#      name       slope
# 1 player1 -0.42857143
# 2 player2  0.08241758
Run Code Online (Sandbox Code Playgroud)

  • 与`by`、`by(gamelogs, gamelogs$name, function(x) lm(game ~ pts, x)[[1]][2])`类似的 (5认同)