导入文本文件时跳过空文件

wal*_*wer 6 for-loop r read.table

我有一个包含大约700个文本文件的文件夹,我想导入并添加一列.我已经弄清楚如何使用以下代码执行此操作:

files = list.files(pattern = "*c.txt")
DF <- NULL
for (f in files) {
  data <- read.table(f, header = F, sep=",")
  data$species <- strsplit(f, split = "c.txt") <-- (column name is filename)
  DF <- rbind(DF, data)
}
write.xlsx(DF,"B:/trends.xlsx")
Run Code Online (Sandbox Code Playgroud)

问题是,大约有100个文件是空的.所以代码停在第一个空文件,我收到此错误消息:

Error in read.table(f, header = F, sep = ",") : 
  no lines available in input
Run Code Online (Sandbox Code Playgroud)

有没有办法跳过这些空文件?

nru*_*ell 6

您可以通过检查以下内容来跳过空文件file.size(some_file) > 0:

files <- list.files("~/tmp/tmpdir", pattern = "*.csv")
##
df_list <- lapply(files, function(x) {
    if (!file.size(x) == 0) {
        read.csv(x)
    }
})
##
R> dim(do.call("rbind", df_list))
#[1] 50  2
Run Code Online (Sandbox Code Playgroud)

这会跳过10个空的文件,并读取其他10个不是的文件.


数据:

for (i in 1:10) {
    df <- data.frame(x = 1:5, y = 6:10)
    write.csv(df, sprintf("~/tmp/tmpdir/file%i.csv", i), row.names = FALSE)
    ## empty file
    system(sprintf("touch ~/tmp/tmpdir/emptyfile%i.csv", i))
}
Run Code Online (Sandbox Code Playgroud)


Sha*_*han 3

对于引入显式错误处理的不同方法,请考虑tryCatch处理read.table.

for (f in files) {
    data <- tryCatch({
        if (file.size(f) > 0){
        read.table(f, header = F, sep=",")
           }
        }, error = function(err) {
            # error handler picks up where error was generated
            print(paste("Read.table didn't work!:  ",err))
        })
    data$species <- strsplit(f, split = "c.txt") 
    DF <- rbind(DF, data)
}
Run Code Online (Sandbox Code Playgroud)