在数据计算结束后有效转换 data.table 的好方法是什么
nrow=500e3
ncol=2000
m <- matrix(rnorm(nrow*ncol),nrow=nrow)
colnames(m) <- c('foo',seq(ncol-1))
dt <- data.table(m)
df <- as.data.frame(m)
dt <- t(dt) #take a long time and converts the data table to a matrix
Run Code Online (Sandbox Code Playgroud)
计算时间
1. to transpose the matrix
system.time(mt <- t(m))
user system elapsed
20.005 0.016 20.024
2. to transpose the dt
system.time(dt <- t(dt))
user system elapsed
32.722 15.129 47.855
3. to transpose a df
system.time(df <- t(df))
user system elapsed
32.414 15.357 47.775
Run Code Online (Sandbox Code Playgroud) 我有一个看起来像这样的数据框(这只是一个子集,实际上数据集有2724098行)
> head(dat)
chr start end enhancer motif
chr10 238000 238600 9_EnhA1 GATA6
chr10 238000 238600 9_EnhA1 GATA4
chr10 238000 238600 9_EnhA1 SRF
chr10 238000 238600 9_EnhA1 MEF2A
chr10 375200 375400 9_EnhA1 GATA6
chr10 375200 375400 9_EnhA1 GATA4
chr10 440400 441000 9_EnhA1 GATA6
chr10 440400 441000 9_EnhA1 GATA4
chr10 440400 441000 9_EnhA1 SRF
chr10 440400 441000 9_EnhA1 MEF2A
chr10 441600 442000 9_EnhA1 SRF
chr10 441600 442000 9_EnhA1 MEF2A
Run Code Online (Sandbox Code Playgroud)
我能够将我的数据集转换为这种格式,其中chr,start,end和enhancer组代表一个ID:
> dat
id motif
1 GATA6
1 GATA4
1 SRF
1 …Run Code Online (Sandbox Code Playgroud)