R:如何在保留其他列的同时聚合某些列

Question

R:如何在保留其他列的同时聚合某些列

我有一个类似这里描述的问题,但没有一个我尝试过的解决方案.

给出这样的表:

Date    Exercise    Category    Weight  Reps    EstMax  RepxWeight  Note
4/2/16  Deadlift    Legs    135 7   166.4685    7x135   easy
4/2/16  Deadlift    Legs    135 7   166.4685    7x135   kinda easy
4/2/16  Deadlift    Legs    135 7   166.4685    7x135   tired
4/2/16  Bench Press Chest   95  5   110.8175    5x95    hard
4/2/16  Bench Press Chest   135 2   143.991 2x135   not hard
4/9/16  Bench Press Chest   135 2   143.991 2x135   a little hard
4/9/16  Bench Press Chest   135 2   143.991 2x135   super tired
4/18/16 Deadlift    Legs    155 8   196.292 8x155   …
4/18/16 Deadlift    Legs    155 5   180.8075    5x155   bad day
5/8/16  Deadlift    Legs    185 3   203.4815    3x185   good day
5/8/16  Deadlift    Legs    185 3   203.4815    3x185   felt easy
5/8/16  Bench Press Chest   115 4   130.318 4x115   easy
5/8/16  Bench Press Chest   115 4   130.318 4x115   hard

Run Code Online (Sandbox Code Playgroud)

我想基于多个其他列(例如和)获取具有某个列(例如)aggregate的max值的行,但是还要保留行中的所有其他列.并且在具有相同最大值的多个条目的情况下,取第一个条目.EstMaxDateExercise

预期的输出看起来像这样:

Date    Exercise    Category    Weight  Reps    EstMax  RepxWeight  Note
4/2/16  Deadlift    Legs    135 7   166.4685    7x135   easy
4/2/16  Bench Press Chest   135 2   143.991 2x135   not hard
4/9/16  Bench Press Chest   135 2   143.991 2x135   a little hard
4/18/16 Deadlift    Legs    155 8   196.292 8x155   …
5/8/16  Deadlift    Legs    185 3   203.4815    3x185   good day
5/8/16  Bench Press Chest   115 4   130.318 4x115   hard

Run Code Online (Sandbox Code Playgroud)

我试过的一些方法的例子; 在每种情况下,'额外列'最终都被用作聚合的因素,这不是我想要的.

data <- structure(list(Date = structure(c(2L, 2L, 2L, 2L, 2L, 3L, 3L, 
1L, 1L, 4L, 4L, 4L, 4L), .Label = c("4/18/16", "4/2/16", "4/9/16", 
"5/8/16"), class = "factor"), Exercise = structure(c(2L, 2L, 
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L), .Label = c("Bench Press", 
"Deadlift"), class = "factor"), Category = structure(c(2L, 2L, 
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L), .Label = c("Chest", 
"Legs"), class = "factor"), Weight = c(135L, 135L, 135L, 95L, 
135L, 135L, 135L, 155L, 155L, 185L, 185L, 115L, 115L), Reps = c(7L, 
7L, 7L, 5L, 2L, 2L, 2L, 8L, 5L, 3L, 3L, 4L, 4L), EstMax = c(166.4685, 
166.4685, 166.4685, 110.8175, 143.991, 143.991, 143.991, 196.292, 
180.8075, 203.4815, 203.4815, 130.318, 130.318), RepxWeight = structure(c(6L, 
6L, 6L, 5L, 1L, 1L, 1L, 7L, 4L, 2L, 2L, 3L, 3L), .Label = c("2x135", 
"3x185", "4x115", "5x155", "5x95", "7x135", "8x155"), class = "factor"), 
    Note = structure(c(4L, 8L, 11L, 7L, 9L, 2L, 10L, 1L, 3L, 
    6L, 5L, 4L, 7L), .Label = c("…", "a little hard", "bad day", 
    "easy", "felt easy", "good day", "hard", "kinda easy", "not hard", 
    "super tired", "tired"), class = "factor")), .Names = c("Date", 
"Exercise", "Category", "Weight", "Reps", "EstMax", "RepxWeight", 
"Note"), class = "data.frame", row.names = c(NA, -13L))

# base R
aggregate(EstMax ~ Date + Exercise, data = data, FUN = max)
# Date    Exercise   EstMax
# 1  4/2/16 Bench Press 143.9910
# 2  4/9/16 Bench Press 143.9910
# 3  5/8/16 Bench Press 130.3180
# 4 4/18/16    Deadlift 196.2920
# 5  4/2/16    Deadlift 166.4685
# 6  5/8/16    Deadlift 203.4815

aggregate(EstMax ~ Date + Exercise + RepxWeight + Note, data = data, FUN = max)
# Date    Exercise RepxWeight          Note   EstMax
# 1  4/18/16    Deadlift      8x155             … 196.2920
# 2   4/9/16 Bench Press      2x135 a little hard 143.9910
# 3  4/18/16    Deadlift      5x155       bad day 180.8075
# 4   5/8/16 Bench Press      4x115          easy 130.3180
# 5   4/2/16    Deadlift      7x135          easy 166.4685
# 6   5/8/16    Deadlift      3x185     felt easy 203.4815
# 7   5/8/16    Deadlift      3x185      good day 203.4815
# 8   5/8/16 Bench Press      4x115          hard 130.3180
# 9   4/2/16 Bench Press       5x95          hard 110.8175
# 10  4/2/16    Deadlift      7x135    kinda easy 166.4685
# 11  4/2/16 Bench Press      2x135      not hard 143.9910
# 12  4/9/16 Bench Press      2x135   super tired 143.9910
# 13  4/2/16    Deadlift      7x135         tired 166.4685


# data table
library("data.table")
data_dt <- data.table(data)
data_dt[ , max(EstMax), by = c("Date", "Exercise")]
# Date    Exercise       V1
# 1:  4/2/16    Deadlift 166.4685
# 2:  4/2/16 Bench Press 143.9910
# 3:  4/9/16 Bench Press 143.9910
# 4: 4/18/16    Deadlift 196.2920
# 5:  5/8/16    Deadlift 203.4815
# 6:  5/8/16 Bench Press 130.3180

data_dt[, max(EstMax), .(Date, Exercise, Weight, Reps, RepxWeight, Note)]
# Date    Exercise Weight Reps RepxWeight          Note       V1
# 1:  4/2/16    Deadlift    135    7      7x135          easy 166.4685
# 2:  4/2/16    Deadlift    135    7      7x135    kinda easy 166.4685
# 3:  4/2/16    Deadlift    135    7      7x135         tired 166.4685
# 4:  4/2/16 Bench Press     95    5       5x95          hard 110.8175
# 5:  4/2/16 Bench Press    135    2      2x135      not hard 143.9910
# 6:  4/9/16 Bench Press    135    2      2x135 a little hard 143.9910
# 7:  4/9/16 Bench Press    135    2      2x135   super tired 143.9910
# 8: 4/18/16    Deadlift    155    8      8x155             … 196.2920
# 9: 4/18/16    Deadlift    155    5      5x155       bad day 180.8075
# 10:  5/8/16    Deadlift    185    3      3x185      good day 203.4815
# 11:  5/8/16    Deadlift    185    3      3x185     felt easy 203.4815
# 12:  5/8/16 Bench Press    115    4      4x115          easy 130.3180
# 13:  5/8/16 Bench Press    115    4      4x115          hard 130.3180

Run Code Online (Sandbox Code Playgroud)

特别喜欢碱R溶液.还看到了which.max()可能有用的功能,但无法弄清楚如何将其应用于此.

我看过的其他相关问题却没有解决这个问题:

根据另一列的聚合将非聚合列添加到聚合数据集

Answer 1

Aur*_*èle 8

我知道您寻求基础R解决方案,但与此同时,这是dplyr一个:

library(dplyr)

data %>% 
  group_by(Date, Exercise) %>% 
  slice(which.max(EstMax))

# # A tibble: 6 x 8
# # Groups:   Date, Exercise [6]
#      Date    Exercise Category Weight  Reps   EstMax RepxWeight          Note
#    <fctr>      <fctr>   <fctr>  <int> <int>    <dbl>     <fctr>        <fctr>
# 1 4/18/16    Deadlift     Legs    155     8 196.2920      8x155             …
# 2  4/2/16 Bench Press    Chest    135     2 143.9910      2x135      not hard
# 3  4/2/16    Deadlift     Legs    135     7 166.4685      7x135          easy
# 4  4/9/16 Bench Press    Chest    135     2 143.9910      2x135 a little hard
# 5  5/8/16 Bench Press    Chest    115     4 130.3180      4x115          easy
# 6  5/8/16    Deadlift     Legs    185     3 203.4815      3x185      good day

Run Code Online (Sandbox Code Playgroud)

编辑

data.table不是我的强项,但为了完整起见,这是我的尝试:

library(data.table)

setDT(data)[, .SD[which.max(EstMax)], by = .(Date, Exercise)]

#       Date    Exercise Category Weight Reps   EstMax RepxWeight          Note
# 1:  4/2/16    Deadlift     Legs    135    7 166.4685      7x135          easy
# 2:  4/2/16 Bench Press    Chest    135    2 143.9910      2x135      not hard
# 3:  4/9/16 Bench Press    Chest    135    2 143.9910      2x135 a little hard
# 4: 4/18/16    Deadlift     Legs    155    8 196.2920      8x155             …
# 5:  5/8/16    Deadlift     Legs    185    3 203.4815      3x185      good day
# 6:  5/8/16 Bench Press    Chest    115    4 130.3180      4x115          easy

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，5 月前
查看次数：	5528 次
最近记录：	8 年，5 月前