相关疑难解决方法(0)

快速读取非常大的表作为数据帧

我有非常大的表(3000万行),我想加载为R中的数据帧 read.table()有很多方便的功能,但似乎实现中有很多逻辑会减慢速度.在我的情况下,我假设我提前知道列的类型,表不包含任何列标题或行名称,并且没有任何我必须担心的病态字符.

我知道在表格中阅读作为列表使用scan()可能非常快,例如:

datalist <- scan('myfile',sep='\t',list(url='',popularity=0,mintime=0,maxtime=0)))

Run Code Online (Sandbox Code Playgroud)

但是我将此转换为数据帧的一些尝试似乎将上述性能降低了6倍:

df <- as.data.frame(scan('myfile',sep='\t',list(url='',popularity=0,mintime=0,maxtime=0))))

Run Code Online (Sandbox Code Playgroud)

有没有更好的方法呢？或者很可能完全不同的方法来解决问题？

import r dataframe r-faq

eyt*_*tan

2018 06-03

489
推荐指数

9
解决办法

19万
查看次数

R阅读巨大的csv

我有一个巨大的csv文件.它的大小约为9 GB.我有16 gb的ram.我按照页面上的建议进行操作并在下面实现.

If you get the error that R cannot allocate a vector of length x, close out of R and add the following line to the ``Target'' field: 
--max-vsize=500M

Run Code Online (Sandbox Code Playgroud)

我仍然收到下面的错误和警告.我应该如何将9 gb的文件读入我的R？我有R 64位3.3.1,我在rstudio 0.99.903中运行命令.我有Windows Server 2012 r2标准,64位操作系统.

> memory.limit()
[1] 16383
> answer=read.csv("C:/Users/a-vs/results_20160291.csv")
Error: cannot allocate vector of size 500.0 Mb
In addition: There were 12 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In scan(file = file, what = what, sep = sep, …

Run Code Online (Sandbox Code Playgroud)

windows csv ram r

use*_*622

2017 05-23

17
推荐指数

3
解决办法

7374
查看次数