我想知道是否有高速最小和最大功能,与colMeans类似,在列上工作?
对于'max',虽然我可以使用'apply'模拟行为,如下所示:
colMax <- function (colData) {
apply(colData, MARGIN=c(2), max)
}
Run Code Online (Sandbox Code Playgroud)
它似乎比基础包中的colMeans慢很多.
Joh*_*lby 11
pmax比...快〜10倍apply.仍然没有那么快colMeans.
data = matrix(rnorm(10^6), 100)
data.df = data.frame(t(data))
system.time(apply(data, MARGIN=c(2), max))
system.time(do.call(pmax, data.df))
system.time(colMeans(data))
Run Code Online (Sandbox Code Playgroud)
> system.time(apply(data, MARGIN=c(2), max))
user system elapsed
0.133 0.006 0.139
> system.time(do.call(pmax, data.df))
user system elapsed
0.013 0.000 0.013
> system.time(colMeans(data))
user system elapsed
0.003 0.000 0.002
Run Code Online (Sandbox Code Playgroud)
人们总是可以从分析开始,但你的预感似乎是正确的:
R> colMax <- function(X) apply(X, 2, max)
R> library(rbenchmark)
R> Z <- matrix(rnorm(100*100), 100, 100)
R> benchmark(colMeans(Z), colMax(Z))
test replications elapsed relative user.self sys.self user.child
2 colMax(Z) 100 0.350 87.5 0.12 0 0
1 colMeans(Z) 100 0.004 1.0 0.00 0 0
R>
Run Code Online (Sandbox Code Playgroud)
在这种情况下,您可能需要考虑编写一个简单的C/C++函数,使用内联的基本C API for R或我们的Rcpp包.这应该得到你的colMeans速度.
编辑:这是一个更完整的例子.colMeans仍然获胜,但我们越来越近了:
R> suppressMessages(library(inline))
R> suppressMessages(library(rbenchmark))
R>
R> colMaxR <- function(X) apply(X, 2, max)
R>
R> colMaxRcpp <- cxxfunction(signature(X_="numeric"), plugin="Rcpp",
+ body='
+ Rcpp::NumericMatrix X(X_);
+ int n = X.ncol();
+ Rcpp::NumericVector V(n);
+ for (int i=0; i<n; i++) {
+ Rcpp::NumericVector W = X.column(i);
+ V[i] = *std::max_element(W.begin(), W.end()); // from the STL
+ }
+ return(V);
+ ')
R>
R>
R> Z <- matrix(rnorm(100*100), 100, 100)
R> benchmark(colMeans(Z), colMaxR(Z), colMaxRcpp(Z), replications=1000, order="relative")
test replications elapsed relative user.self sys.self user.child
1 colMeans(Z) 1000 0.036 1.00000 0.04 0 0
3 colMaxRcpp(Z) 1000 0.050 1.38889 0.05 0 0
2 colMaxR(Z) 1000 1.002 27.83333 1.01 0 0
R>
Run Code Online (Sandbox Code Playgroud)
我发布一个答案只是因为我没有足够的声誉来发表评论或投票.
最重要的答案pmax比apply不总是正确的快10倍.例如,计算每列中10 ^ 6个数字的最大值.
data <- matrix(rnorm(10^8), 10^6)
data.t <- t(data)
data.df <- data.frame(data)
data.t.df = data.frame(data.t)
system.time(a <- apply(data, MARGIN=c(2), max))
system.time(b <- sapply(data.df, max))
system.time(e <- sapply(seq_len(ncol(data)), function(x) max(data[, x])))
system.time(c <- do.call(pmax, data.t.df))
system.time(d <- colMaxs(data))
> system.time(a <- apply(data, MARGIN=c(2), max))
user system elapsed
2 0 2
> system.time(b <- sapply(data.df, max))
user system elapsed
0.25 0.00 0.25
> system.time(e <- sapply(seq_len(ncol(data)), function(x) max(data[, x])))
user system elapsed
0.83 0.00 0.83
> system.time(c <- do.call(pmax, data.t.df))
user system elapsed
15.94 0.00 15.96
> system.time(d <- colMaxs(data))
user system elapsed
0.21 0.00 0.20
Run Code Online (Sandbox Code Playgroud)
现在计算每列中100个数字的最大值.
system.time(a <- apply(data.t, MARGIN=c(2), max))
system.time(b <- sapply(data.t.df, max))
system.time(e <- sapply(seq_len(ncol(data.t)), function(x) max(data.t[, x])))
system.time(c <- do.call(pmax, data.df))
system.time(d <- colMaxs(data.t))
> system.time(a <- apply(data.t, MARGIN=c(2), max))
user system elapsed
4.41 0.00 4.42
> system.time(b <- sapply(data.t.df, max))
user system elapsed
3.23 0.00 3.23
> system.time(e <- sapply(seq_len(ncol(data.t)), function(x) max(data.t[, x])))
user system elapsed
3.57 0.00 3.57
> system.time(c <- do.call(pmax, data.df))
user system elapsed
1.56 0.00 1.56
> system.time(d <- colMaxs(data.t))
user system elapsed
0.25 0.00 0.25
Run Code Online (Sandbox Code Playgroud)
当行数很小(例如100)时,似乎pmax只与apply速度相当或更好.当行数很大(例如10 ^ 6)时,pmax要慢很多apply.
无论如何,colMaxs在matrixStats包装中是最快的,而且似乎是要走的路.