如何在Rcpp中求和10步矩阵行?

0 r rcpp

我想使用Rcpp获得以下结果.当大数据时,R很慢.因此,我尝试在Rcpp中编码.

x <- matrix(1:150, ncol = 5)
z <- matrix(nrow = nrow(x) / 10, ncol = 5)
for (i in 1:5) {
    for (j in 1:(nrow(x) / 10)) {
    k = (j - 1) * 10 + 1;
    z[j, i] <- sum(x[k:(k+9), i])
    }
}
x
       [,1] [,2] [,3] [,4] [,5]
 [1,]    1   31   61   91  121
 [2,]    2   32   62   92  122
 [3,]    3   33   63   93  123
 [4,]    4   34   64   94  124
 [5,]    5   35   65   95  125
 [6,]    6   36   66   96  126
 [7,]    7   37   67   97  127
 [8,]    8   38   68   98  128
 [9,]    9   39   69   99  129
[10,]   10   40   70  100  130
[11,]   11   41   71  101  131
[12,]   12   42   72  102  132
[13,]   13   43   73  103  133
[14,]   14   44   74  104  134
[15,]   15   45   75  105  135
[16,]   16   46   76  106  136
[17,]   17   47   77  107  137
[18,]   18   48   78  108  138
[19,]   19   49   79  109  139
[20,]   20   50   80  110  140
[21,]   21   51   81  111  141
[22,]   22   52   82  112  142
[23,]   23   53   83  113  143
[24,]   24   54   84  114  144
[25,]   25   55   85  115  145
[26,]   26   56   86  116  146
[27,]   27   57   87  117  147
[28,]   28   58   88  118  148
[29,]   29   59   89  119  149
[30,]   30   60   90  120  150

z
      [,1] [,2] [,3] [,4] [,5]
 [1,]   55  355  655  955 1255
 [2,]  155  455  755 1055 1355
 [3,]  255  555  855 1155 1455
Run Code Online (Sandbox Code Playgroud)

我尝试的代码的Rcpp如下.

#include <Rcpp.h> 
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector mySum(NumericMatrix x) {

    int ncol = x.ncol();
    int nrow = x.nrow();
    int outRow = nrow / 10;
    int i;
    int j;
    int k;
    Rcpp::NumericMatrix z(outRow, ncol);

    for (i = 0; i < ncol; i++) {
        for (j = 0; j < outRow; j++) {
        k = j * 10;
        Rcpp::SubMatrix<REALSXP> sm = x(Range(k, k + 9), i);
        Rcpp::NumericMatrix m(sm);
        double s = Rcpp::sum(m);
        z(j, i) = s;
        }
    }
  return z;
}
Run Code Online (Sandbox Code Playgroud)

但是,由于错误,它不会移动.请告诉我解决方案.

test.cpp: In function 'Rcpp::NumericVector mySum(Rcpp::NumericMatrix)':
test.cpp:18:59: error: no match for call to '(Rcpp::NumericMatrix {aka Rcpp::Matrix<14>}) (Rcpp::Range, int&)'
Run Code Online (Sandbox Code Playgroud)

Dav*_*urg 6

实际上在基本R中有一个完全向量化的函数rowsum,可以非常有效地按组进行求和(作为旁注,R并不总是很慢,它主要取决于你如何使用它).

x <- matrix(1:150, ncol = 5)
rowsum.default(x, cumsum(seq_len(nrow(x)) %% 10L == 1L), reorder = FALSE)
#   [,1] [,2] [,3] [,4] [,5]
# 1   55  355  655  955 1255
# 2  155  455  755 1055 1355
# 3  255  555  855 1155 1455
Run Code Online (Sandbox Code Playgroud)

它肯定比Rcpp版本慢,但在我的系统上,一个20MM行矩阵和5列在不到3秒的时间内运行

x <- matrix(seq_len(1e8), ncol = 5)
dim(x)
## [1] 20000000        5
system.time(mySum(x))
# user  system elapsed 
# 0.72    0.24    0.96 
system.time(rowsum.default(x, cumsum(seq_len(nrow(x)) %% 10L == 1L), reorder = FALSE))
# user  system elapsed 
# 2.77    0.15    2.93 
Run Code Online (Sandbox Code Playgroud)

编辑:根据您的评论,对您的真实数据集rowsum进行测试的速度甚至比Rcpp版本更快

x <- matrix(seq_len(62400*4100), ncol = 4100)
dim(x)
## [1] 62400  4100
system.time(mySum(x))
# user  system elapsed 
# 1.53    1.03    2.57 
system.time(rowsum.default(x, cumsum(seq_len(nrow(x)) %% 10L == 1L), reorder = FALSE))
# user  system elapsed 
# 1.48    0.00    1.50 
Run Code Online (Sandbox Code Playgroud)

  • 谢谢你的回答.您的代码很有趣,也很有帮助.如你所说,根据编码,R并不慢. (2认同)