R：需要用坐标计数填充矩阵（来自带有坐标列表的数据框，其中一些是重复的）

Question

R：需要用坐标计数填充矩阵（来自带有坐标列表的数据框，其中一些是重复的）

我有一个具有 (x,y) 坐标的事件列表，其中 x 的范围从 1 到 100，y 的范围从 1 到 86。每个坐标都有（通常有很多）重复项。我想用每个坐标的计数填充一个矩阵（实际上是一个数字网格）。我该怎么做呢？

现在，我最好的尝试是：

s=matrix(data=NA,nrow=n,ncol=k)
for(i in 1:n){
  for(j in 1:k){
    s[i,j]=nrow(subset(data,x_column==i & y_column==j))
  }
}

Run Code Online (Sandbox Code Playgroud)

这适用于小型（约 10,000 行）数据帧，但我想针对近 300 万行的数据帧运行它，而我的方法太慢了。

编辑（数据）：

n=86;k=100;
x_column y_column
54          30
51          32
65          34
19          46
51          27
45          60
62          31
64          45
16          69
31          33

Run Code Online (Sandbox Code Playgroud)

多谢你们！

编辑：好吧，事实证明该程序的速度足以满足我的需求——我的工作区陷入了大量数据的泥潭，它减慢了我尝试做的所有事情。所以我的方法是有效的，但了解填充矩阵的替代方法是有好处的。我上传了前 10 行；有人可以做一下速度测试吗？

Answer 1

Jos*_*ien 5

这是一种使用data.table和 Matrix包的方法：

library(data.table)
library(Matrix)

f <- function(df, nx, ny)  {
    ## Tally up the frequencies
    dt <- data.table(df, key=c("x", "y"))
    xyN <- dt[, .N, by=key(dt)]
    ## Place counts in matrix in their respective i/j x/y row/column
    as.matrix(with(xyN, sparseMatrix(i=x,j=y,x=N,dims=c(nx,ny))))
}

## Check that it works:
df <- data.frame(x=c(2,2,2,3,3,3), y=c(1,1,1,1,2,2))
f(df, nx=4, ny=4)
#      [,1] [,2] [,3] [,4]
# [1,]    0    0    0    0
# [2,]    3    0    0    0
# [3,]    1    2    0    0
# [4,]    0    0    0    0

## Speed test with 3 million coordinates
df <- data.frame(x=sample(1:100, 3e6,replace=T), y=sample(1:86, 3e6, replace=T))
system.time(res <- f(df, nx=100, ny=86))
#    user  system elapsed 
#    0.16    0.03    0.19 
sum(res)
# [1] 3e+06

Run Code Online (Sandbox Code Playgroud)

如果您可以保证每个可能的行和列中至少有一些坐标，您可以只使用基本 R table()（尽管它没有那么快）：

df <- data.frame(x=sample(1:100, 3e6,replace=T), y=sample(1:86, 3e6, replace=T))
system.time(res2 <- as.matrix(table(df)))
#    user  system elapsed 
#    2.67    0.07    2.74 
sum(res2)
# [1] 3000000

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，2 月前
查看次数：	1178 次
最近记录：	11 年，2 月前