我有长期每小时的降雨量和温度数据。我想从每小时数据中获取每日值。我考虑的日是指从 07:00:00 到第二天 07:00:00。
您能告诉我如何将特定时间间隔之间的每小时数据转换为每日数据吗?
示例:07:00:00 to 07:00:00或12:00:00 to 12:00:00)
降雨量数据如下:
1970-01-05 00:00:00 1.0
1970-01-05 01:00:00 1.0
1970-01-05 02:00:00 1.0
1970-01-05 03:00:00 1.0
1970-01-05 04:00:00 1.0
1970-01-05 05:00:00 3.6
1970-01-05 06:00:00 3.6
1970-01-05 07:00:00 2.2
1970-01-05 08:00:00 2.2
1970-01-05 09:00:00 2.2
1970-01-05 10:00:00 2.2
1970-01-05 11:00:00 2.2
1970-01-05 12:00:00 2.2
1970-01-05 13:00:00 2.2
1970-01-05 14:00:00 2.2
1970-01-05 15:00:00 2.2
1970-01-05 16:00:00 0.0
1970-01-05 17:00:00 0.0
1970-01-05 18:00:00 0.0
1970-01-05 19:00:00 0.0
1970-01-05 20:00:00 0.0
1970-01-05 21:00:00 0.0
1970-01-05 22:00:00 0.0
1970-01-05 23:00:00 0.0
1970-01-06 00:00:00 0.0
Run Code Online (Sandbox Code Playgroud)
首先,创建一些可重复的数据,以便我们更好地帮助您:
require(xts)
set.seed(1)
X = data.frame(When = as.Date(seq(from = ISOdatetime(2012, 01, 01, 00, 00, 00),
length.out = 100, by="1 hour")),
Measurements = sample(1:20, 100, replace=TRUE))
Run Code Online (Sandbox Code Playgroud)
现在,我们有一个包含 100 个每小时观测值的数据框,其中日期的开始2012-01-01 00:00:00和结束日期为2012-01-05 03:00:00(时间采用 24 小时格式)。
其次,将其转换为 XTS 对象。
X2 = xts(X$Measurements, order.by=X$When)
Run Code Online (Sandbox Code Playgroud)
第三,了解如何对特定时间窗口进行子集化。
X2['T04:00/T08:00']
# [,1]
# 2012-01-01 04:00:00 5
# 2012-01-01 05:00:00 18
# 2012-01-01 06:00:00 19
# 2012-01-01 07:00:00 14
# 2012-01-01 08:00:00 13
# 2012-01-02 04:00:00 18
# 2012-01-02 05:00:00 7
# 2012-01-02 06:00:00 10
# 2012-01-02 07:00:00 12
# 2012-01-02 08:00:00 10
# 2012-01-03 04:00:00 9
# 2012-01-03 05:00:00 5
# 2012-01-03 06:00:00 2
# 2012-01-03 07:00:00 2
# 2012-01-03 08:00:00 7
# 2012-01-04 04:00:00 18
# 2012-01-04 05:00:00 8
# 2012-01-04 06:00:00 16
# 2012-01-04 07:00:00 20
# 2012-01-04 08:00:00 9
Run Code Online (Sandbox Code Playgroud)
第四,将该信息用于apply.daily您想要的任何功能,如下所示:
apply.daily(X2['T04:00/T08:00'], mean)
# [,1]
# 2012-01-01 08:00:00 13.8
# 2012-01-02 08:00:00 11.4
# 2012-01-03 08:00:00 5.0
# 2012-01-04 08:00:00 14.2
Run Code Online (Sandbox Code Playgroud)
重新阅读您的问题后,我发现我误解了您想要的内容。
看来你想取24小时内的平均值,不一定是从午夜到午夜。
为此,您应该放弃apply.daily并使用period.apply自定义endpoint,如下所示:
# You want to start at 7AM. Find out which record is the first one at 7AM.
A = which(as.character(index(X2)) == "2012-01-01 07:00:00")
# Use that to create your endpoints.
# The ends of the endpoints should start at 0
# and end at the max number of records.
ep = c(0, seq(A, 100, by=24), 100)
period.apply(X2, INDEX=ep, FUN=function(x) mean(x))
# [,1]
# 2012-01-01 07:00:00 12.62500
# 2012-01-02 07:00:00 10.08333
# 2012-01-03 07:00:00 10.79167
# 2012-01-04 07:00:00 11.54167
# 2012-01-05 03:00:00 10.25000
Run Code Online (Sandbox Code Playgroud)