R - 从文件名创建时间序列

Question

R - 从文件名创建时间序列

Bar*_*ios 0 r time-series

我有 900 个文件名为20120412_bwDD2yYa.txt. _ 之前的第一部分采用年-月-日格式。有些日子有多个文件与之关联。

我想使用从文件名中提取的日期作为数据来编译时间序列，其中日期是 x 轴，文件数是 y 轴。

我怎样才能做到这一点？

Answer 1

Len*_*ski 5

这是 Base R 的解决方案。由于该问题不包含可重现的示例，我们将模拟文件名、解析日期并按日期创建计数。

# use list.files() to extract files from directory
files <- list.files(path="./data",pattern="*.txt",full.names = FALSE)

# simulate result from list.files()
files <- c("20120101_aaa.txt","20120101_bbb.txt","20120102_ccc.txt")

# extract dates from file names 
date <- as.Date(substr(files,1,8),"%Y%m%d")

df <- data.frame(date,count = rep(1,length(date)))
aggregate(count ~ date,data = df, sum)

Run Code Online (Sandbox Code Playgroud)

...和输出：

        date count
1 2012-01-01     2
2 2012-01-02     1

Run Code Online (Sandbox Code Playgroud)

dplyr 解决方案

一个dplyr::summarise()看起来像这样的解决方案：

files <- list.files(path="./data",pattern="*.txt",full.names = FALSE)
# simulate result from list.files() 
files <- c("20120101_aaa.txt","20120101_bbb.txt","20120102_ccc.txt")
library(dplyr)
data.frame(date=as.Date(substr(files,1,8),"%Y%m%d")) %>% 
     group_by(date) %>% summarise(count = n())


# A tibble: 2 x 2
  date       count
  <date>     <int>
1 2012-01-01     2
2 2012-01-02     1

Run Code Online (Sandbox Code Playgroud)

考虑没有文件的日期

为了回应对我的回答的评论，这里有一个解决方案，它填补了文件列表中存在 0 个文件的日子。我们从文件列表中获取最小和最大日期，并创建一个包含日期序列的数据框。然后我们left_join()使用先前聚合的数据，并将 NA 值重新编码为count0。

# create a gap in dates with files
files <- c("20120101_aaa.txt","20120101_bbb.txt","20120102_ccc.txt",
           "20120104_aaa.txt","20120104_aab.txt","20120104_aac.txt")
library(dplyr)
data.frame(date=as.Date(substr(files,1,8),"%Y%m%d")) %>% 
     group_by(date) %>% summarise(count = n()) -> fileCounts
# create df with all dates, left_join() and recode NA to 0
data.frame(date = as.Date(min(fileCounts$date):max(fileCounts$date),
                                    origin = "1970-01-01")) %>%
     left_join(.,fileCounts) %>% 
     mutate(count = if_else(is.na(count),0,as.numeric(count)))

Run Code Online (Sandbox Code Playgroud)

...和输出：

Joining, by = "date"
        date count
1 2012-01-01     2
2 2012-01-02     1
3 2012-01-03     0
4 2012-01-04     3

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，7 月前
查看次数：	41 次
最近记录：	5 年，7 月前