R:关于内存管理的说明

asb*_*asb 5 memory r bigdata

假设我有一个矩阵bigm.我需要使用这个矩阵的随机子集并将其提供给机器学习算法,比如说svm.矩阵的随机子集仅在运行时才知道.此外,还有其他参数也可以从网格中选择.

所以,我的代码看起来像这样:

foo = function (bigm, inTrain, moreParamsList) {
  parsList = c(list(data=bigm[inTrain, ]), moreParamsList)
  do.call(svm, parsList)
}
Run Code Online (Sandbox Code Playgroud)

我想知道的是R是否使用新内存bigm[inTrain, ]在parsList中保存该对象.(我的猜测确实如此.)我可以使用哪些命令来测试这些假设?另外,有没有一种方法在不使用新内存的情况下在R中使用子矩阵?

编辑:

另外,假设我foo使用mclapply(在Linux上)调用bigm驻留在父进程中.这是否意味着我正在制作mc.cores副本bigm或所有核心只使用父级的对象?

跟踪内存位置和在不同内核中生成的对象消耗的任何功能和启发式算法?

谢谢.

asb*_*asb 1

我只是想把我从这个主题的研究中发现的内容放在这里:

我认为使用不会根据手册中的内容mclapply进行mc.cores复制:bigmmulticore

In a nutshell fork spawns a copy (child) of the current process, that can work in parallel
to the master (parent) process. At the point of forking both processes share exactly the
same state including the workspace, global options, loaded packages etc. Forking is
relatively cheap in modern operating systems and no real copy of the used memory is
created, instead both processes share the same memory and only modified parts are copied.
This makes fork an ideal tool for parallel processing since there is no need to setup the
parallel working environment, data and code is shared automatically from the start.
Run Code Online (Sandbox Code Playgroud)