如何在基数R中分组

Mas*_*het 4 group-by r

我想使用基数R(没有任何特定的软件包)来表示以下SQL查询:

select month, day, count(*) as count, avg(dep_delay) as avg_delay
from flights
group by month, day
having count > 1000
Run Code Online (Sandbox Code Playgroud)

它选择平均起飞延迟和繁忙日(每天有超过1000个航班的天数)的每日航班数。数据集为nycflights13,其中包含2013年从纽约出发的航班信息。

注意,我可以轻松地在dplyr中将其编写为:

flights %>%
  group_by(month, day) %>%
  summarise(count = n(), avg_delay = mean(dep_delay, na.rm = TRUE)) %>%
  filter(count > 1000)
Run Code Online (Sandbox Code Playgroud)

Mau*_*ers 6

由于我之前曾被提醒过by(@Parfait的帽子尖)的优雅之处,因此这里是使用by以下解决方案:

res <- by(flights, list(flights$month, flights$day), function(x)
    if (nrow(x) > 1000) {
        c(
            month = unique(x$month),
            day = unique(x$day),
            count = nrow(x),
            avg_delay = mean(x$dep_delay, na.rm = TRUE))
        })

# Store in data.frame and order by month, day
df <- do.call(rbind, res);
df <- df[order(df[, 1], df[, 2]) ,];
#     month day count avg_delay
#[1,]     7   8  1004 37.296646
#[2,]     7   9  1001 30.711499
#[3,]     7  10  1004 52.860702
#[4,]     7  11  1006 23.609392
#[5,]     7  12  1002 25.096154
#[6,]     7  17  1001 13.670707
#[7,]     7  18  1003 20.626789
#[8,]     7  25  1003 19.674134
#[9,]     7  31  1001  6.280843
#[10,]     8   7  1001  8.680402
#[11,]     8   8  1001 43.349947
#[12,]     8  12  1001  8.308157
#[13,]    11  27  1014 16.697651
#[14,]    12   2  1004  9.021978
Run Code Online (Sandbox Code Playgroud)


小智 0

这不是一个特别优雅的解决方案,但这将使用 Base R 完成您想要的事情

flights_split <- split(flights, f = list(flights$month, flights$day))

result <- lapply(flights_split, function(x) {
  if(nrow(x) > 1000) {
    data.frame(month = unique(x$month), day = unique(x$day), avg_delay = mean(x$dep_delay, na.rm = T), count = nrow(x))
  } else {
    NULL
  }
}
)

do.call(rbind, result)

#        month day mean_delay    n
#  12.2     12   2   9.021978 1004
#  8.7       8   7   8.680402 1001
#  7.8       7   8  37.296646 1004
#  8.8       8   8  43.349947 1001
#  7.9       7   9  30.711499 1001
#  7.10      7  10  52.860702 1004
#  7.11      7  11  23.609392 1006
#  7.12      7  12  25.096154 1002
#  8.12      8  12   8.308157 1001
#  7.17      7  17  13.670707 1001
#  7.18      7  18  20.626789 1003
#  7.25      7  25  19.674134 1003
#  11.27    11  27  16.697651 1014
#  7.31      7  31   6.280843 1001
Run Code Online (Sandbox Code Playgroud)