我有一个CSV文件,其第一行包含变量名称,其余行包含数据.将它分解为每个只包含一个R变量的文件的好方法是什么?这个解决方案是否会变得强大?例如,如果输入文件的大小是100G怎么办?
输入文件看起来像
var1,var2,var3
1,2,hello
2,5,yay
...
Run Code Online (Sandbox Code Playgroud)
我要创建3(或包含很多变量)的文件var1.csv,var2.csv,var3.csv,使文件类似 文件1
var1
1
2
...
Run Code Online (Sandbox Code Playgroud)
文件2
var2?
2
5
...
Run Code Online (Sandbox Code Playgroud)
文件3
var3
hello
yay
Run Code Online (Sandbox Code Playgroud)
我在Python中得到了一个解决方案(如何将大型CSV数据文件分解为单个数据文件?)但我想知道R是否可以做同样的事情?Python代码必不可少的是逐行读取csv文件,然后一次写出一行.R可以这样做吗?read.csv命令一次读取整个文件,这可以减慢整个过程.另外,当R尝试将整个文件读入内存时,它无法读取100G文件并对其进行处理.我在R中找不到一个命令,让你逐行读取csv文件.请帮忙.谢谢!!
你可以scan,然后write到一个文件(S)在一次一行.
i <- 0
while({x <- scan("file.csv", sep = ",", skip = i, nlines = 1, what = "character");
length(x) > 1}) {
write(x[1], "file1.csv", sep = ",", append = T)
write(x[2], "file2.csv", sep = ",", append = T)
write(x[3], "file3.csv", sep = ",", append = T)
i <- i + 1
}
Run Code Online (Sandbox Code Playgroud)
编辑!!我使用上面的数据,复制了1000多次.当我们始终打开文件连接时,我已经完成了速度的比较.
ver1 <- function() {
i <- 0
while({x <- scan("file.csv", sep = ",", skip = i, nlines = 1, what = "character");
length(x) > 1}) {
write(x[1], "file1.csv", sep = ",", append = T)
write(x[2], "file2.csv", sep = ",", append = T)
write(x[3], "file3.csv", sep = ",", append = T)
i <- i + 1
}
}
system.time(ver1()) # w/ close to 3K lines of data, 3 columns
## user system elapsed
## 2.809 0.417 3.629
ver2 <- function() {
f <- file("file.csv", "r")
f1 <- file("file1.csv", "w")
f2 <- file("file2.csv", "w")
f3 <- file("file3.csv", "w")
while({x <- scan(f, sep = ",", skip = 0, nlines = 1, what = "character");
length(x) > 1}) {
write(x[1], file = f1, sep = ",", append = T, ncol = 1)
write(x[2], file = f2, sep = ",", append = T, ncol = 1)
write(x[3], file = f3, sep = ",", append = T, ncol = 1)
}
closeAllConnections()
}
system.time(ver2())
## user system elapsed
## 0.257 0.098 0.409
Run Code Online (Sandbox Code Playgroud)