R - 更快的 hist(XX,plot=FALSE)$count 替代方案

Question

R - 更快的 hist(XX,plot=FALSE)$count 替代方案

我正在寻找 Rhist(x, breaks=XXX, plot=FALSE)$count函数的更快替代方案，因为我不需要生成任何其他输出（因为我想在调用中使用它sapply，需要调用此函数的 100 万次迭代），例如

x = runif(100000000, 2.5, 2.6)
bincounts = hist(x, breaks=seq(0,3,length.out=100), plot=FALSE)$count

Run Code Online (Sandbox Code Playgroud)

有什么想法吗？

Answer 1

lmo*_*lmo 5

第一次尝试使用tableand cut：

table(cut(x, breaks=seq(0,3,length.out=100)))

Run Code Online (Sandbox Code Playgroud)

它避免了额外的输出，但在我的计算机上大约需要 34 秒：

system.time(table(cut(x, breaks=seq(0,3,length.out=100))))
   user  system elapsed 
 34.148   0.532  34.696

Run Code Online (Sandbox Code Playgroud)

与 3.5 秒相比hist：

system.time(hist(x, breaks=seq(0,3,length.out=100), plot=FALSE)$count)
   user  system elapsed 
  3.448   0.156   3.605

Run Code Online (Sandbox Code Playgroud)

使用tabulateand.bincode运行速度比hist：

tabulate(.bincode(x, breaks=seq(0,3,length.out=100)), nbins=100)

system.time(tabulate(.bincode(x, breaks=seq(0,3,length.out=100))), nbins=100)
   user  system elapsed 
  3.084   0.024   3.107

Run Code Online (Sandbox Code Playgroud)

相对于和而言，使用tablulateandfindInterval可以显着提高性能，并且相对于也有不错的改进：tablecuthist

tabulate(findInterval(x, vec=seq(0,3,length.out=100)), nbins=100)

system.time(tabulate(findInterval(x, vec=seq(0,3,length.out=100))), nbins=100)
   user  system elapsed 
  2.044   0.012   2.055

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，6 月前
查看次数：	943 次
最近记录：	9 年，6 月前