根据另一列中的最高值选择一个值

Mik*_*kko 8 r reshape2

我不明白为什么我找不到解决方案,因为我觉得这是一个非常基本的问题.那么需要寻求帮助.我想按月重新安排空气质量数据集,每个月的最大温度值.另外,我想找到每个月最高温度的相应日期.什么是最懒惰(代码方式)的方法呢?

我试过没有成功:

require(reshape2)
names(airquality) <- tolower(names(airquality))
mm <- melt(airquality, id.vars = c("month", "day"), meas = c("temp"))

dcast(mm, month + day ~ variable, max)
aggregate(formula = temp ~ month + day, data = airquality, FUN = max)
Run Code Online (Sandbox Code Playgroud)

我是这样的:

month day temp
5     7    89
...
Run Code Online (Sandbox Code Playgroud)

Mat*_*wle 8

一段时间以来,有关懒惰是否好的讨论.Anwyay,这是一个简短而自然的写入和读取(对于大数据来说速度很快,因此您不需要在以后更改或优化它):

require(data.table)
DT=as.data.table(airquality)

DT[,.SD[which.max(Temp)],by=Month]

     Month Ozone Solar.R Wind Temp Day
[1,]     5    45     252 14.9   81  29
[2,]     6    NA     259 10.9   93  11
[3,]     7    97     267  6.3   92   8
[4,]     8    76     203  9.7   97  28
[5,]     9    73     183  2.8   93   3
Run Code Online (Sandbox Code Playgroud)

.SD是每个组的数据子集,您只需要具有最大Temp,iiuc的行.如果您需要行号,则可以添加.

或者获取max绑定的所有行:

DT[,.SD[Temp==max(Temp)],by=Month]

     Month Ozone Solar.R Wind Temp Day
[1,]     5    45     252 14.9   81  29
[2,]     6    NA     259 10.9   93  11
[3,]     7    97     267  6.3   92   8
[4,]     7    97     272  5.7   92   9
[5,]     8    76     203  9.7   97  28
[6,]     9    73     183  2.8   93   3
[7,]     9    91     189  4.6   93   4
Run Code Online (Sandbox Code Playgroud)


mne*_*nel 5

plyr 的另一种方法

require(reshape2)
names(airquality) <- tolower(names(airquality))
mm <- melt(airquality, id.vars = c("month", "day"), meas = c("temp"), value.name = 'temp')

library(plyr)

ddply(mm, .(month), subset, subset = temp == max(temp), select = -variable)
Run Code Online (Sandbox Code Playgroud)

给予

  month day temp
1     5  29   81
2     6  11   93
3     7   8   92
4     7   9   92
5     8  28   97
6     9   3   93
7     9   4   93
Run Code Online (Sandbox Code Playgroud)

或者,更简单

require(reshape2)
require(plyr)
names(airquality) <- tolower(names(airquality))
ddply(airquality, .(month), subset, 
  subset = temp == max(temp), select = c(month, day, temp) )
Run Code Online (Sandbox Code Playgroud)