计算R中bigmatrix的零空间

Mah*_*hin 14 r matrix large-data r-bigmemory

我找不到任何函数或包来计算R 中的bigmatrix(from library(bigmemory))的零空间或(QR分解).例如:

library(bigmemory)

a <- big.matrix(1000000, 1000, type='double', init=0)
Run Code Online (Sandbox Code Playgroud)

我尝试了以下但是显示了错误.如何找到bigmemory对象的空白空间?

a.qr <- Matrix::qr(a)
# Error in as.vector(data) : 
#   no method for coercing this S4 class to a vector
q.null <- MASS::Null(a)
# Error in as.vector(data) : 
#   no method for coercing this S4 class to a vector
Run Code Online (Sandbox Code Playgroud)

F. *_*ivé 9

如果要计算矩阵的完整SVD,可以使用package bigstatsr按块执行计算.A FBM代表Filebacked Big Matrix,是一个类似于bigmemory big.matrix包的文件备份对象的对象.

library(bigstatsr)
options(bigstatsr.block.sizeGB = 0.5)

# Initialize FBM with random numbers
a <- FBM(1e6, 1e3)
big_apply(a, a.FUN = function(X, ind) {
  X[, ind] <- rnorm(nrow(X) * length(ind))
  NULL
}, a.combine = 'c')

# Compute t(a) * a
K <- big_crossprodSelf(a, big_scale(center = FALSE, scale = FALSE))

# Get v and d where a = u * d * t(v) the SVD of a
eig <- eigen(K[])
v <- eig$vectors
d <- sqrt(eig$values)

# Get u if you need it. It will be of the same size of u
# so that I store it as a FBM.
u <- FBM(nrow(a), ncol(a))
big_apply(u, a.FUN = function(X, ind, a, v, d) {
  X[ind, ] <- sweep(a[ind, ] %*% v, 2, d, "/")
  NULL
}, a.combine = 'c', block.size = 50e3, ind = rows_along(u),
a = a, v = v, d = d)

# Verification
ind <- sample(nrow(a), 1000)
all.equal(a[ind, ], tcrossprod(sweep(u[ind, ], 2, d, "*"), v))
Run Code Online (Sandbox Code Playgroud)

这在我的电脑上大约需要10分钟.

  • @Mahin根据我的理解,你谈到的完整SVD只是在"你"中添加了一些(无用的)列.`v`应该是相同的,您可以通过比较`svd(mat)$ v`和`svd(mat,nu = nrow(mat),nv = ncol(mat))$ v`来验证它对任何矩阵有更多行而不是列. (3认同)
  • @F.Privé你是对的.我错了.在这两种情况下,svd(mat)$ v和svd(mat,nu = nrow(mat),nv = ncol(mat))$ v,矩阵元素是相同的,我现在可以从v计算nullspace.谢谢你非常喜欢 (2认同)