为什么R数据集比从R写入但从Stata读取的Stata数据集占用更多内存

use*_*710 2 memory r stata

考虑以下R数据集.

object.size(mtcars)
6736 bytes

#writing this object as rds

write.rds(mtcar,"mt.rds") 

#properties of the file shows it as 1.218 KB
#reading back rds file

dataRDS<-read.rds("mt.rds")
object.size(dataRDS)
6736 bytes  #this is the same as original mtcars (not surprising)

#writing this object as Stata data

write.dta(mtcars,"mt.dta") 
#clicking the properties of file shows the size as 4.5 KB 
#reading back Stata data in R

dataDTA<-read.dta("mt.dta")
object.size(dataDTA)
8656 bytes 

# this is larger than the original file size

#reading Stata data from Stata gives the size as 2.82 KB


 obs:            32                          Written by R.              
 vars:            11                          
 size:         2,816 
Run Code Online (Sandbox Code Playgroud)

为什么默认R对象在读取R时占用的内存比读取Stata中从R转换为Stata数据的相同数据集要多?

pic*_*ick 7

大多数它似乎是大小的差异attributes,你可以看到它们的存储方式不同.并比较尺寸,

> object.size(attributes(dataDTA)) - object.size(attributes(dataRDS))
1696 bytes

> object.size(dataDTA) - object.size(dataRDS)
1920 bytes
Run Code Online (Sandbox Code Playgroud)

差异可能是由于对object.size真实尺寸的估计.

  • 实际上,这就是Stata正在做的事情,你可以检查一下.`mtcars`的尺寸为32*11,因此如果每个单元格为8个字节,则其大小为32*11*8 = 2816. (2认同)