R:如何过滤/子集一系列日期

Oma*_*les 10 r subset filter dplyr

我有这些数据:(完成Dicember)

      date     sessions
1   2014-12-01  1932
2   2014-12-02  1828
3   2014-12-03  2349
4   2014-12-04  8192
5   2014-12-05  3188
6   2014-12-06  3277
Run Code Online (Sandbox Code Playgroud)

并且需要对此进行子菜单/过滤,例如从"2014-12-05"到"2014-12-25"

我现在可以使用运算符":"创建序列.

示例:b < - c(1:5)

但是如何过滤序列?我试过这个

NewDate <- filter(Dates, date("2014-12-05":"2014-12-12"))
Run Code Online (Sandbox Code Playgroud)

但是说:

错误:意外符号:"NewDate < - 过滤器(日期,日期("2014-12-05":"2014-12-12")NewDate"

jaz*_*rro 19

如果你想使用dplyr,你可以尝试这样的事情.

mydf <- structure(list(date = structure(c(16405, 16406, 16407, 16408, 
16409, 16410), class = "Date"), sessions = c(1932L, 1828L, 2349L, 
8192L, 3188L, 3277L)), .Names = c("date", "sessions"), row.names = c("1", 
"2", "3", "4", "5", "6"), class = "data.frame")

# Create date object
mydf$date <- as.Date(mydf$date) 

filter(mydf, between(date, as.Date("2014-12-02"), as.Date("2014-12-05")))

#If you avoid using `between()`, the code is simpler.

filter(mydf, date >= "2014-12-02", date <= "2014-12-05")
filter(mydf, date >= "2014-12-02" & date <= "2014-12-05")

#        date sessions
#1 2014-12-02     1828
#2 2014-12-03     2349
#3 2014-12-04     8192
#4 2014-12-05     3188
Run Code Online (Sandbox Code Playgroud)

  • 我以为逻辑条件应该是`&`,但看起来像`,`一样有效。对我来说这是新事物。谢谢。 (2认同)
  • @akrun似乎两者都很好,不是吗?我在UseR的Hadley的dplyr教程pdf中看到了这两个版本!2014。我将发布两个版本。我对“介于”之间的行为感到困惑。我不知道为什么需要再次使用`as.Date`。 (2认同)
  • 相同的原因是seq.Date(as.Date(x1),as.Date(x2),by =“ years”)`-您需要在`Date`对象上工作,以便数据匹配。 (2认同)

jal*_*pic 17

你可以用 subset

生成示例数据:

temp<-
read.table(text="date     sessions
2014-12-01  1932
2014-12-02  1828
2014-12-03  2349
2014-12-04  8192
2014-12-05  3188
2014-12-06  3277", header=T)
Run Code Online (Sandbox Code Playgroud)

确保它的日期格式:

temp$date <- as.Date(temp$date, format= "%Y-%m-%d")

temp



 #        date sessions
 # 1 2014-12-01     1932
 # 2 2014-12-02     1828
 # 3 2014-12-03     2349
 # 4 2014-12-04     8192
 # 5 2014-12-05     3188
 # 6 2014-12-06     3277
Run Code Online (Sandbox Code Playgroud)

使用subset:

subset(temp, date> "2014-12-03" & date < "2014-12-05")
Run Code Online (Sandbox Code Playgroud)

这使:

  #        date sessions
  # 4 2014-12-04     8192
Run Code Online (Sandbox Code Playgroud)

你也可以用[]:

temp[(temp$date> "2014-12-03" & temp$date < "2014-12-05"),]
Run Code Online (Sandbox Code Playgroud)


akr*_*run 9

一个选项使用 data.table

 library(data.table)
 setDT(df)[date %between% c('2014-12-02', '2014-12-05')]
 #         date sessions
 #1: 2014-12-02     1828
 #2: 2014-12-03     2349
 #3: 2014-12-04     8192
 #4: 2014-12-05     3188
Run Code Online (Sandbox Code Playgroud)

即使"日期"是"字符"列,这也应该有效

 df$date <- as.character(df$date)
 setDT(df)[date %between% c('2014-12-02', '2014-12-05')]
 #       date sessions
 #1: 2014-12-02     1828
 #2: 2014-12-03     2349
 #3: 2014-12-04     8192
 #4: 2014-12-05     3188
Run Code Online (Sandbox Code Playgroud)

如果我们想要排除该范围的子集

  setDT(df)[between(date, '2014-12-02', '2014-12-05', incbounds=FALSE)]
  #         date sessions
  #1:  2014-12-03     2349
  #2:  2014-12-04     8192
Run Code Online (Sandbox Code Playgroud)

数据

 df <-  structure(list(date = structure(c(16405, 16406, 16407, 16408, 
 16409, 16410), class = "Date"), sessions = c(1932L, 1828L, 2349L, 
 8192L, 3188L, 3277L)), .Names = c("date", "sessions"), row.names = c("1", 
 "2", "3", "4", "5", "6"), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

  • @Dan如果您正在使用data.table中的between,即between(x,lower,uppercent,incbounds = TRUE)#x%between%y (2认同)