我不明白为什么我找不到解决方案,因为我觉得这是一个非常基本的问题.那么需要寻求帮助.我想按月重新安排空气质量数据集,每个月的最大温度值.另外,我想找到每个月最高温度的相应日期.什么是最懒惰(代码方式)的方法呢?
我试过没有成功:
require(reshape2)
names(airquality) <- tolower(names(airquality))
mm <- melt(airquality, id.vars = c("month", "day"), meas = c("temp"))
dcast(mm, month + day ~ variable, max)
aggregate(formula = temp ~ month + day, data = airquality, FUN = max)
Run Code Online (Sandbox Code Playgroud)
我是这样的:
month day temp
5 7 89
...
Run Code Online (Sandbox Code Playgroud)
一段时间以来,有关懒惰是否好的讨论.Anwyay,这是一个简短而自然的写入和读取(对于大数据来说速度很快,因此您不需要在以后更改或优化它):
require(data.table)
DT=as.data.table(airquality)
DT[,.SD[which.max(Temp)],by=Month]
Month Ozone Solar.R Wind Temp Day
[1,] 5 45 252 14.9 81 29
[2,] 6 NA 259 10.9 93 11
[3,] 7 97 267 6.3 92 8
[4,] 8 76 203 9.7 97 28
[5,] 9 73 183 2.8 93 3
Run Code Online (Sandbox Code Playgroud)
.SD是每个组的数据子集,您只需要具有最大Temp,iiuc的行.如果您需要行号,则可以添加.
或者获取max绑定的所有行:
DT[,.SD[Temp==max(Temp)],by=Month]
Month Ozone Solar.R Wind Temp Day
[1,] 5 45 252 14.9 81 29
[2,] 6 NA 259 10.9 93 11
[3,] 7 97 267 6.3 92 8
[4,] 7 97 272 5.7 92 9
[5,] 8 76 203 9.7 97 28
[6,] 9 73 183 2.8 93 3
[7,] 9 91 189 4.6 93 4
Run Code Online (Sandbox Code Playgroud)
plyr 的另一种方法
require(reshape2)
names(airquality) <- tolower(names(airquality))
mm <- melt(airquality, id.vars = c("month", "day"), meas = c("temp"), value.name = 'temp')
library(plyr)
ddply(mm, .(month), subset, subset = temp == max(temp), select = -variable)
Run Code Online (Sandbox Code Playgroud)
给予
month day temp
1 5 29 81
2 6 11 93
3 7 8 92
4 7 9 92
5 8 28 97
6 9 3 93
7 9 4 93
Run Code Online (Sandbox Code Playgroud)
或者,更简单
require(reshape2)
require(plyr)
names(airquality) <- tolower(names(airquality))
ddply(airquality, .(month), subset,
subset = temp == max(temp), select = c(month, day, temp) )
Run Code Online (Sandbox Code Playgroud)