我想根据可变时间间隔过滤我的时间序列。更具体地说,考虑来自时间戳t的时间t_i。我想过滤我的时间序列,以便剩下的时间序列只包含从t_i - 15 分钟到并包括t_i + 15 分钟的时间戳。
这是我尝试过的:
library(lubridate)
library(dplyr)
mv <- 2 # moving window
t <- as.POSIXct("2020-06-20 12:00", tz="UTC") # time stamp
time <- seq(ymd_hm('2020-01-01 00:00'),ymd_hm('2020-12-31 23:45'), by = '15 mins')
ts <- tibble(time=time, data=sin(seq(1,length(time),1)))
# What I did:
ts %>%
filter(time >= t - mv*24*60*60) %>%
filter(time <= t) %>%
filter(strftime(time, format = "%H:%M", tz = "UTC") >= strftime(t-15*60, format = "%H:%M", tz = "UTC")) %>%
filter(strftime(time, format = "%H:%M", tz = "UTC") <= strftime(t+15*60, format = "%H:%M", tz = "UTC"))
Output:
# A tibble: 7 x 2
time data
<dttm> <dbl>
1 2020-06-18 12:00:00 -0.435
2 2020-06-18 12:15:00 0.523
3 2020-06-19 11:45:00 0.298
4 2020-06-19 12:00:00 0.964
5 2020-06-19 12:15:00 0.744
6 2020-06-20 11:45:00 0.885
7 2020-06-20 12:00:00 0.0870
Run Code Online (Sandbox Code Playgroud)
这正是我想要的,但它在t <- as.POSIXct("2020-06-20 23:45", tz="UTC")(也有00:00)时崩溃:
# A tibble: 0 x 2
# … with 2 variables: time <dttm>, data <dbl>
Run Code Online (Sandbox Code Playgroud)
我包含了一个 if-else 语句来规避这一点,但它远非优雅,并没有给我我想要的东西:
t <- as.POSIXct("2020-06-20 23:45", tz="UTC") # time stamp
if(strftime(t, format = "%H:%M", tz = "UTC") %in% c("23:45","00:00")){
ts %>%
filter(time >= t - mv*24*60*60) %>%
filter(time <= t) %>%
filter(strftime(time, format = "%H:%M", tz = "UTC") >= strftime(t-15*60, format = "%H:%M", tz = "UTC"))
} else {
ts %>%
filter(time >= t - mv*24*60*60) %>%
filter(time <= t) %>%
filter(strftime(time, format = "%H:%M", tz = "UTC") >= strftime(t-15*60, format = "%H:%M", tz = "UTC")) %>%
filter(strftime(time, format = "%H:%M", tz = "UTC") <= strftime(t+15*60, format = "%H:%M", tz = "UTC"))
}
Output:
# A tibble: 5 x 2
time data
<dttm> <dbl>
1 2020-06-18 23:45:00 0.543
2 2020-06-19 23:30:00 -0.177
3 2020-06-19 23:45:00 -0.924
4 2020-06-20 23:30:00 -0.936
5 2020-06-20 23:45:00 -0.209
Desired output:
# A tibble: 7 x 2
time data
<dttm> <dbl>
1 2020-06-18 23:45:00 0.543
2 2020-06-19 00:00:00 -0.413
3 2020-06-19 23:30:00 -0.177
4 2020-06-19 23:45:00 -0.924
5 2020-06-20 00:00:00 -0.821
6 2020-06-20 23:30:00 -0.936
7 2020-06-20 23:45:00 -0.209
Run Code Online (Sandbox Code Playgroud)
几天之间的转换似乎存在问题,但我不知道如何解决它,也找不到类似的问题。有没有办法(优雅地)实现这一目标?
它似乎strftime(ts$time[1], format = "%H:%M", tz = "UTC") > strftime(t, format = "%H:%M", tz = "UTC")被评估为FALSE有意义取决于你如何看待它。
为了缓解这种情况,您需要充分的评估YYYY-MM-DD HH:MM,以便“正确”地对其进行评估。如果您评估整个字符串而不是仅评估hours.
我们可以intervals通过添加一个dummy我们称之为time_包含所有 的变量来获得HH:MM,然后将它们视为strings,
# Troublesome Vector;
t <- ymd_hm("2020-06-20 23:45", tz="UTC")
ts %>% filter(
between(
time,
left = t - mv*24*60*60 -15*60,
right = t
)
) %>% mutate(
time_ = strftime(time, format = "%H:%M", tz = "UTC") %>% as.character()
) %>% filter(
str_detect(
time_,
pattern = seq(
t-15*60,
t+15*60,
by = "15 mins"
) %>% strftime(format = "%H:%M", tz = "UTC") %>% paste(
collapse = "|"
)
)
)
Run Code Online (Sandbox Code Playgroud)
这给出了output,
# A tibble: 8 x 3
time data time_
<dttm> <dbl> <chr>
1 2020-06-18 23:30:00 1.00 23:30
2 2020-06-18 23:45:00 0.543 23:45
3 2020-06-19 00:00:00 -0.413 00:00
4 2020-06-19 23:30:00 -0.177 23:30
5 2020-06-19 23:45:00 -0.924 23:45
6 2020-06-20 00:00:00 -0.821 00:00
7 2020-06-20 23:30:00 -0.936 23:30
8 2020-06-20 23:45:00 -0.209 23:45
Run Code Online (Sandbox Code Playgroud)