我正在研究R中的一个程序来计算最多1000个数据点的Gabriel图.我使用了我在网上找到的程序(GabrielGraph基于Bhattacharya等人,1981年第781-830行).
不幸的是,获得结果需要相当多的时间,所以我尝试使用Rcpp重新编程.为此,我写了几个小程序和一个叫做edge的大程序,用来计算Gabriel图的边缘.我也是Rcpp编程的新手,所以我可能做了比必要更复杂的事情,但我不知道如何做得更好.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double vecnorm(NumericVector x){
//to calculate the vectornorm sqrt(sum of (vector entries)^2)
double out;
out = sqrt(sum(pow(x,2.0)));
return out;
}
// [[Rcpp::export]]
NumericVector vektorzugriff(NumericMatrix xy,int i){
//to return a row of the Matrix xy
int col = xy.ncol();
NumericVector out(col);
for(int j=0; j<=col; j++){
out[j] = xy(i-1,j);
}
return out;
}
// [[Rcpp::export]]
IntegerVector vergl(NumericVector eins, NumericVector zwei){
//to see if two Vectors have any identical entries
IntegerVector out = match(eins, zwei);
return out;
}
// [[Rcpp::export]]
IntegerVector verglInt(int eins, NumericVector zwei){
NumericVector dummy = NumericVector::create( eins ) ;
IntegerVector out = match(dummy, zwei);
return out;
}
// [[Rcpp::export]]
NumericVector toVec(NumericVector excluded, int k){
//to append int k to a Vector excluded
NumericVector dummy = NumericVector::create( k ) ;
int len = excluded.size();
int len2 = dummy.size();
int i=0;
NumericVector out(len+len2);
while(i<len+len2){
if(i<len){
out[i]=excluded[i];
i++;
}
else{
out[i]=dummy[i-len];
i++;
}
}
return out;
}
// [[Rcpp::export]]
LogicalVector isNA(IntegerVector x) {
//to see which Vector Entries are NAs
int n = x.size();
LogicalVector out(n);
for (int i = 0; i < n; ++i) {
out[i] = IntegerVector::is_na(x[i]);
}
return out;
}
// [[Rcpp::export]]
NumericMatrix Gab(NumericMatrix Gabriel, NumericVector edges1, NumericVector edges2, int anz){
//to fill a Matrix with the Gabrieledges
for(int i=0; i<anz; i++) {
Gabriel(edges1[i]-1, edges2[i]-1) = 1 ;
Gabriel(edges2[i]-1, edges1[i]-1) = 1 ;
}
return Gabriel;
}
// [[Rcpp::export]]
NumericVector edges(NumericMatrix xy,NumericVector vertices,NumericVector excluded, int i){
//actual function to calculate the edges of the GabrielGraph
int npts = xy.nrow()+1;
double d1;
double d2;
double d3;
for(int r=i+1; r<npts; r++) {
// Skip vertices in excluded
if(!is_true(any(isNA(verglInt(r,excluded))))){
continue;}
d1 = vecnorm(vektorzugriff(xy,i) - vektorzugriff(xy,r));
for(int k=1; k<npts; k++) {
if((k!=r) && (k!=i)){
d2 = vecnorm(vektorzugriff(xy,i) - vektorzugriff(xy,k));
d3 = vecnorm(vektorzugriff(xy,r) - vektorzugriff(xy,k));
//Betrachte vertices, die noch nicht excluded sind
if(!is_true(any(isNA(verglInt(k,vertices[isNA(vergl(vertices,excluded))]))))){
//Wenn d(x,z)^2 > d(x,y)^2+d(y,z)^2 -> Kante gehoert nicht zum GG
if( pow(d2,2.0) > pow(d1,2.0) + pow(d3,2.0) ) {
excluded = toVec(excluded,k);
}
}
if( pow(d1,2.0) > pow(d2,2.0) + pow(d3,2.0) ){
excluded = toVec(excluded,r);
break;
}
}
}
}
return excluded;
}
Run Code Online (Sandbox Code Playgroud)
我在这个R程序中使用了这些Rcpp程序:
GabrielGraphMatrix <- function(X,Y,PlotIt=FALSE){
# Heuristic rejection Algorithm for Gabriel Graph Construction (Bhattacharya et al. 1981)
# Algorithm is ~ O(d n^2)
#loading Rcpp functions
library(Rcpp)
sourceCpp("... .cpp")
XY <- cbind(X,Y)
ndim <- ncol(XY)
npts <- nrow(XY)
edges1<- c()
edges2<- c()
for( i in 1:(npts-1) ) {
# Candidate set of Gabriel neighbors
vertices <- (i+1):npts
# Initialize list of vertices to be excluded from Ni
excluded <- edges(XY,vertices,vector(),i);
adj <- vertices[which(!match(vertices,excluded,nomatch=F)>0)]
if(length(adj) > 0) {
edges1=c(edges1,rep(i,length(adj)))
edges2=c(edges2,adj)
}
}
anz <- length(edges1)
Gabriel <- Gab(matrix(0, npts, npts),edges1,edges2,anz)
return(list(Gabriel=Gabriel,edges=cbind(edges1,edges2)))
}
Run Code Online (Sandbox Code Playgroud)
对于十个数据点的样本数据,它工作正常,例如:
z <- 10
X <- runif(z)*100
Y <- runif(z)*100
GabrielGraphMatrix(X,Y)
Run Code Online (Sandbox Code Playgroud)
回报
> GabrielGraphMatrix(X,Y)
$Gabriel
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 1 0 0 0 0 0 0 0 0
[2,] 1 0 0 1 0 0 1 0 0 0
[3,] 0 0 0 1 1 0 0 0 0 1
[4,] 0 1 1 0 0 0 0 0 0 0
[5,] 0 0 1 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0 1 0 0
[7,] 0 1 0 0 0 0 0 0 0 0
[8,] 0 0 0 0 0 1 0 0 1 1
[9,] 0 0 0 0 0 0 0 1 0 1
[10,] 0 0 1 0 0 0 0 1 1 0
$edges
edges1 edges2
[1,] 1 2
[2,] 2 4
[3,] 2 7
[4,] 3 4
[5,] 3 5
[6,] 3 10
[7,] 6 8
[8,] 8 9
[9,] 8 10
[10,] 9 10
Run Code Online (Sandbox Code Playgroud)
但是,如果我尝试放入更大的数据集,我会收到以下错误消息:
Error: Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'builtin'
Run Code Online (Sandbox Code Playgroud)
如果有人至少知道我做错了什么,我会非常感激.
我无法重现你的错误,但它引发了各种类似的错误,并且经常导致 R 崩溃。这里有几个明显的问题。
在你的 C++ 函数中,Gab你至少有两个问题:
anz在使用变量之前 无需定义它。Gabriel。这
Gabriel(edges1[i]-1, edges2[i]-1)
Run Code Online (Sandbox Code Playgroud)
应该
Gabriel[edges1[i]-1, edges2[i]-1]
Run Code Online (Sandbox Code Playgroud)
在你的 R 函数中,GabrielGraphMatrix你正在成长edges1并且edges2处于循环之中。这意味着它们必须在 for 循环的每次迭代中重新分配。一旦你的循环长度超过了微不足道的长度,这就会导致问题。
相反,将它们预先分配为列表,然后调用unlist以获取所需的向量。
# before the loop
edges1 <- vector("list", npts - 1)
edges2 <- vector("list", npts - 1)
# in the loop
if(length(adj) > 0) {
edges1[[i]] <- rep(i,length(adj))
edges2[[i]] <- adj
}
# after the loop
edges1 <- unlist(edges1)
edges2 <- unlist(edges2)
Run Code Online (Sandbox Code Playgroud)