我想按时间间隔聚合数据帧,对每列应用不同的函数.我想我几乎已经aggregate失败了,并且已将我的数据分成与chron包装的间隔,这很容易.
但我不知道如何处理子集.所有的地图功能,*apply,*ply,采取一种功能(我希望的东西,拿了功能的载体每列或-variable申请,但还没有找到一个),所以我在写一个函数,我的数据框架子集,并给出所有变量的均值,除了"时间",它是索引,"径流"应该是总和.
我试过这个:
aggregate(d., list(Time=trunc(d.$time, "00:10:00")), function (dat) with(dat,
list(Time=time[1], mean(Port.1), mean(Port.1.1), mean(Port.2), mean(Port.2.1),
mean(Port.3), mean(Port.3.1), mean(Port.4), mean(Port.4.1), Runoff=sum(Port.5))))
Run Code Online (Sandbox Code Playgroud)
即使它没有给我这个错误,这将是丑陋的:
Error in eval(substitute(expr), data, enclos = parent.frame()) :
not that many frames on the stack
Run Code Online (Sandbox Code Playgroud)
这告诉我,我真的做错了什么.从我看到的RI看来,必须有一种优雅的方式来做到这一点,但它是什么?
dput:
d. <- structure(list(time = structure(c(15030.5520833333, 15030.5555555556,
15030.5590277778, 15030.5625, 15030.5659722222), format = structure(c("m/d/y",
"h:m:s"), .Names = c("dates", "times")), origin = structure(c(1,
1, 1970), .Names = c("month", "day", "year")), class = c("chron",
"dates", "times")), Port.1 = c(0.359747, 0.418139, 0.417459,
0.418139, 0.417459), Port.1.1 = c(1.3, 11.8, 11.9, 12, 12.1),
Port.2 = c(0.288837, 0.335544, 0.335544, 0.335544, 0.335544
), Port.2.1 = c(2.3, 13, 13.2, 13.3, 13.4), Port.3 = c(0.253942,
0.358257, 0.358257, 0.358257, 0.359002), Port.3.1 = c(2,
12.6, 12.7, 12.9, 13.1), Port.4 = c(0.352269, 0.410609, 0.410609,
0.410609, 0.410609), Port.4.1 = c(5.9, 17.5, 17.6, 17.7,
17.9), Port.5 = c(0L, 0L, 0L, 0L, 0L)), .Names = c("time",
"Port.1", "Port.1.1", "Port.2", "Port.2.1", "Port.3", "Port.3.1",
"Port.4", "Port.4.1", "Port.5"), row.names = c(NA, 5L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
你的方法有很多问题.一般建议不是直截了当地认为最终语句应该是什么样子,而是以递增的方式工作,否则它会使调试(理解和修复错误)变得非常困难.
例如,您可以从以下开始:
aggregate(d., list(Time=trunc(d.$time, "00:10:00")), identity)
Run Code Online (Sandbox Code Playgroud)
注意你的split变量有问题.显然aggregate不喜欢使用这类数据.您可以通过转换Time为数字来解决此问题:
aggregate(d., list(Time=as.numeric(trunc(d.$time, "00:10:00"))), identity)
Run Code Online (Sandbox Code Playgroud)
然后你可以试试
aggregate(d., list(Time=as.numeric(trunc(d.$time, "00:10:00"))), apply.fun)
Run Code Online (Sandbox Code Playgroud)
apply.fun您的用户定义函数在哪里.这失败了一个相当危险的消息,但运行
aggregate(d., list(Time=as.numeric(trunc(d.$time, "00:10:00"))), print)
Run Code Online (Sandbox Code Playgroud)
有助于意识到FUN内部函数aggregate没有为每个数据块调用一次(并传递一个data.frame),但是对于每一列数据块调用一次(并传递一个未命名的向量),所以你无法做到得到你想要的结果aggregate.
相反,您可以使用包中的ddply功能plyr.在那里,应用于每个部分的函数确实接收data.frame,因此您可以执行以下操作:
apply.fun <- function(dat) with(dat, data.frame(Time=time[1],
mean(Port.1),
mean(Port.1.1),
mean(Port.2),
mean(Port.2.1),
mean(Port.3),
mean(Port.3.1),
mean(Port.4),
mean(Port.4.1),
Runoff=sum(Port.5)))
d.$Time <- as.numeric(trunc(d.$time, "00:10:00"))
library(plyr)
ddply(d., "Time", apply.fun)
# Time mean.Port.1. mean.Port.1.1. mean.Port.2. mean.Port.2.1.
# 1 15030.5520833 0.4061886 9.82 0.3262026 11.04
# mean.Port.3. mean.Port.3.1. mean.Port.4. mean.Port.4.1. Runoff
# 1 0.337543 10.66 0.398941 15.32 0
Run Code Online (Sandbox Code Playgroud)
编辑:在下面的第一条评论中跟进@roysc问题,您可以这样做:
apply.fun <- function(dat) {
out <- as.data.frame(lapply(dat, mean))
out$Time <- dat$time[1]
out$Runoff <- sum(dat$Port.5)
return(out)
}
Run Code Online (Sandbox Code Playgroud)
用by而不是aggregate.
如果f与您的匿名函数相同,除非list在其中替换为data.frame,f <- function(dat) with(dat, data.frame(...whatever...))然后:
d.by <- by(d., list(Time = trunc(d.$time, "00:10:00")), f)
d.rbind <- do.call("rbind", d.by) # bind rows together
# fix up row and column names
rownames(d.rbind) <- NULL
colnames(d.rbind) <- colnames(d.)
Run Code Online (Sandbox Code Playgroud)
如果f添加名称本身而不是仅仅添加名称,我们可以删除分配列名称的最后一个语句Time.
| 归档时间: |
|
| 查看次数: |
3403 次 |
| 最近记录: |