我想剪切大型csv文件(文件大小超过RAM大小)并使用它们或将每个文件保存在磁盘中供以后使用.对于大文件,哪个R包最适合这样做?
人们应该使用 ff 包的 read.csv.ffdf 以及像这样的特定参数来读取大文件:
library(ff)
a <- read.csv.ffdf(file="big.csv", header=TRUE, VERBOSE=TRUE, first.rows=1000000, next.rows=1000000, colClasses=NA)
Run Code Online (Sandbox Code Playgroud)
将大文件读入 ff 对象后,可以使用以下命令将 ffobject 子集到数据帧中:a[1000:1000000,]
用于子集化和保存损坏数据帧的其余代码totalrows = dim(a)[1] row.size = as.integer(object.size(a[1:10000,])) / 10000 #in bytes
block.size = 200000000 #in bytes .IN Mbs 200 Mb
#rows.block is rows per block
rows.block = ceiling(block.size/row.size)
#nmaps is the number of chunks/maps of big dataframe(ff), nmaps = number of maps - 1
nmaps = floor(totalrows/rows.block)
for(i in (0:nmaps)){
if(i==nmaps){
df = a[(i*rows.block+1) : totalrows,]
}
else{
df = a[(i*rows.block+1) : ((i+1)*rows.block),]
}
#process df or save it
write.csv(df,paste0("M",i+1,".csv"))
#remove df
rm(df)
}
Run Code Online (Sandbox Code Playgroud)