逐行读取R中的文本文件

Lay*_*yla 52 file-io text r

我想在R中逐行读取一个文本文件,使用for循环和文件的长度.问题是它只打印字符(0).这是代码:

fileName="up_down.txt"
con=file(fileName,open="r")
line=readLines(con) 
long=length(line)
for (i in 1:long){
    linn=readLines(con,1)
    print(linn)
}
close(con)
Run Code Online (Sandbox Code Playgroud)

小智 112

你应该小心readLines(...)和大文件.读取内存中的所有行可能存在风险.下面是一个如何读取文件和处理一行的示例:

processFile = function(filepath) {
  con = file(filepath, "r")
  while ( TRUE ) {
    line = readLines(con, n = 1)
    if ( length(line) == 0 ) {
      break
    }
    print(line)
  }

  close(con)
}
Run Code Online (Sandbox Code Playgroud)

了解在记忆中读取一条线的风险.没有换行符的大文件也可以填满你的记忆.

  • 这应该是公认的答案,因为其他人会遇到大文件的问题. (8认同)
  • 建议这是逐行解析大文件的正确方法。其他答案将所有行读入内存,然后在内存中循环该对象,这与此完全不同。 (3认同)
  • readLines 文档:**“如果连接打开,则从其当前位置读取。”** 这就是循环工作的原因。 (2认同)

Dir*_*tel 40

只需readLines在您的文件上使用:

R> res <- readLines(system.file("DESCRIPTION", package="MASS"))
R> length(res)
[1] 27
R> res
 [1] "Package: MASS"                                                                  
 [2] "Priority: recommended"                                                          
 [3] "Version: 7.3-18"                                                                
 [4] "Date: 2012-05-28"                                                               
 [5] "Revision: $Rev: 3167 $"                                                         
 [6] "Depends: R (>= 2.14.0), grDevices, graphics, stats, utils"                      
 [7] "Suggests: lattice, nlme, nnet, survival"                                        
 [8] "Authors@R: c(person(\"Brian\", \"Ripley\", role = c(\"aut\", \"cre\", \"cph\"),"
 [9] "        email = \"ripley@stats.ox.ac.uk\"), person(\"Kurt\", \"Hornik\", role"  
[10] "        = \"trl\", comment = \"partial port ca 1998\"), person(\"Albrecht\","   
[11] "        \"Gebhardt\", role = \"trl\", comment = \"partial port ca 1998\"),"     
[12] "        person(\"David\", \"Firth\", role = \"ctb\"))"                          
[13] "Description: Functions and datasets to support Venables and Ripley,"            
[14] "        'Modern Applied Statistics with S' (4th edition, 2002)."                
[15] "Title: Support Functions and Datasets for Venables and Ripley's MASS"           
[16] "License: GPL-2 | GPL-3"                                                         
[17] "URL: http://www.stats.ox.ac.uk/pub/MASS4/"                                      
[18] "LazyData: yes"                                                                  
[19] "Packaged: 2012-05-28 08:47:38 UTC; ripley"                                      
[20] "Author: Brian Ripley [aut, cre, cph], Kurt Hornik [trl] (partial port"          
[21] "        ca 1998), Albrecht Gebhardt [trl] (partial port ca 1998), David"        
[22] "        Firth [ctb]"                                                            
[23] "Maintainer: Brian Ripley <ripley@stats.ox.ac.uk>"                               
[24] "Repository: CRAN"                                                               
[25] "Date/Publication: 2012-05-28 08:53:03"                                          
[26] "Built: R 2.15.1; x86_64-pc-mingw32; 2012-06-22 14:16:09 UTC; windows"           
[27] "Archs: i386, x64"                                                               
R> 
Run Code Online (Sandbox Code Playgroud)

有一本专门用于此的手册......

  • 当你说有一本专门的手册时,你也应该告诉我们它是哪本手册. (6认同)

Lay*_*yla 35

这是带for循环的解决方案.重要的是,它会将一次调用带readLines出for循环,以便不会一次又一次地调用它.这里是:

fileName <- "up_down.txt"
conn <- file(fileName,open="r")
linn <-readLines(conn)
for (i in 1:length(linn)){
   print(linn[i])
}
close(conn)
Run Code Online (Sandbox Code Playgroud)

  • 那么如果你有一个30 gig文件会发生什么? (4认同)
  • 因为您打印整个矢量,所以根本不需要for循环.只是`print(linn)`就够了. (2认同)
  • 非常好的答案.在R"< - "通常用于约定而不是"=" (2认同)