如何从子目录导入文件,并使用子目录名称R命名它们

HCA*_*CAI 9 r dplyr

我想从子目录递归导入文件(不同长度)并将它们放入一个data.frame中,其中一列包含子目录名称,另一列包含文件名(减去扩展名):

e.g. folder structure
IsolatedData
  00
    tap-4.out
    cl_pressure.out
  15
    tap-4.out
    cl_pressure.out
Run Code Online (Sandbox Code Playgroud)

到目前为止,我有:

setwd("~/Documents/IsolatedData")
l <- list.files(pattern = ".out$",recursive = TRUE)
p <- bind_rows(lapply(1:length(l), function(i) {chars <- strsplit(l[i], "/");
cbind(data.frame(Pressure = read.table(l[i],header = FALSE,skip=2, nrow =length(readLines(l[i])))),
      Angle = chars[[1]][1], Location = chars[[1]][1])}), .id = "id")
Run Code Online (Sandbox Code Playgroud)

但我得到一个错误,说第43行没有2个元素.

还看到这个使用dplyr看起来整洁但我无法让它工作:http://www.machinegurning.com/rstats/map_df/

tbl <-
  list.files(recursive=T,pattern=".out$")%>% 
  map_df(~data_frame(x=.x),.id="id")
Run Code Online (Sandbox Code Playgroud)

cam*_*lle 7

这是一个包含tidyverse内部map功能的工作流程purrr.

我生成了一堆csv文件来模仿你的文件结构和一些简单的数据.我在每个文件的开头扔了两行垃圾数据,因为你说你试图跳过前两行.

library(tidyverse)

setwd("~/_R/SO/nested")

walk(paste0("folder", 1:3), dir.create)

list.files() %>%
    walk(function(folderpath) {
        map(1:4, function(i) {
            df <- tibble(
                x1 = sample(letters[1:3], 10, replace = T),
                x2 = rnorm(10)
            )
            dummy <- tibble(
                x1 = c("junk line 1", "junk line 2"),
                x2 = c(0)
            )
            bind_rows(dummy, df) %>%
                write_csv(sprintf("%s/file%s.out", folderpath, i))
        })
    })
Run Code Online (Sandbox Code Playgroud)

这将获得以下文件结构:

??? folder1
|  ??? file1.out
|  ??? file2.out
|  ??? file3.out
|  ??? file4.out
??? folder2
|  ??? file1.out
|  ??? file2.out
|  ??? file3.out
|  ??? file4.out
??? folder3
   ??? file1.out
   ??? file2.out
   ??? file3.out
   ??? file4.out
Run Code Online (Sandbox Code Playgroud)

然后我习惯list.files(recursive = T)获取这些文件的路径列表,str_extract用于为每个文件提取文件和文件名,读取跳过虚拟文本的csv文件,然后添加文件夹和文件名,以便将它们添加到数据框.

自从我这样做以后map_dfr,我得到了一个反复,每次迭代的数据帧都被rbind编辑在一起.

all_data <- list.files(recursive = T) %>%
    map_dfr(function(path) {
        # any characters from beginning of path until /
        foldername <- str_extract(path, "^.+(?=/)")
        # any characters between / and .out at end
        filename <- str_extract(path, "(?<=/).+(?=\\.out$)")

        # skip = 3 to skip over names and first 2 lines
        # could instead use col_names = c("x1", "x2")
        read_csv(path, skip = 3, col_names = F) %>%
            mutate(folder = foldername, file = filename)
    })

head(all_data)
#> # A tibble: 6 x 4
#>   X1        X2 folder  file 
#>   <chr>  <dbl> <chr>   <chr>
#> 1 b      0.858 folder1 file1
#> 2 b      0.544 folder1 file1
#> 3 a     -0.180 folder1 file1
#> 4 b      1.14  folder1 file1
#> 5 b      0.725 folder1 file1
#> 6 c      1.05  folder1 file1
Run Code Online (Sandbox Code Playgroud)

reprex包(v0.2.0)创建于2018-04-21.