我在程序包StatMatch(http://cran.r-project.org/web/packages/StatMatch/StatMatch.pdf)中找到了mahalanobis.dist函数,但它并没有完全符合我的要求.它似乎是计算数据中每次观察的马哈拉诺比斯距离.对于data.x中的每次观察
我想计算data.y中一个观测值的mahalanobis距离到data.x中的所有观测值.如果有意义的话,基本上计算一个点的马哈拉诺比斯距离到点的"云".有点想到观察成为另一组观察的一部分的概念
这个人(http://people.revoledu.com/kardi/tutorial/Similarity/MahalanobisDistance.html)似乎正在这样做,我试图在R中复制他的过程但是当我到达底部时它失败了等式:
mahaldist = sqrt((inversepooledcov %*% t(meandiffmatrix)) %*% meandiffmatrix)
Run Code Online (Sandbox Code Playgroud)
我正在使用的所有代码都在这里:
a = rbind(c(2,2), c(2,5), c(6,5),c(7,3))
colnames(a) = c('x', 'y')
b = rbind(c(6,5),c(3,4))
colnames(b) = c('x', 'y')
acov = cov(a)
bcov = cov(b)
meandiff1 = mean(a[,1]) - mean(b[,1])
meandiff2 = mean(a[,2]) - mean(b[,2])
meandiffmatrix = rbind(c(meandiff1,meandiff2))
totaldata = dim(a)[1] + dim(b)[1]
pooledcov = (dim(a)[1]/totaldata * acov) + (dim(b)[1]/totaldata * bcov)
inversepooledcov = solve(pooledcov)
mahaldist = sqrt((inversepooledcov %*% t(meandiffmatrix)) %*% meandiffmatrix)
Run Code Online (Sandbox Code Playgroud)
如何mahalanobis在stats包中使用该功能:
mahalanobis(x, center, cov, inverted = FALSE, ...)
Run Code Online (Sandbox Code Playgroud)
我一直在你看到的同一个网站上尝试这个,然后偶然发现了这个问题.我设法让脚本工作,但我得到了不同的结果.
#WORKING EXAMPLE
#MAHALANOBIS DIST OF TWO MATRICES
#define matrix
mat1<-matrix(data=c(2,2,6,7,4,6,5,4,2,1,2,5,5,3,7,4,3,6,5,3),nrow=10)
mat2<-matrix(data=c(6,7,8,5,5,5,4,7,6,4),nrow=5)
#center data
mat1.1<-scale(mat1,center=T,scale=F)
mat2.1<-scale(mat2,center=T,scale=F)
#cov matrix
mat1.2<-cov(mat1.1,method="pearson")
mat2.2<-cov(mat2.1,method="pearson")
n1<-nrow(mat1)
n2<-nrow(mat2)
n3<-n1+n2
#pooled matrix
mat3<-((n1/n3)*mat1.2) + ((n2/n3)*mat2.2)
#inverse pooled matrix
mat4<-solve(mat3)
#mean diff
mat5<-as.matrix((colMeans(mat1)-colMeans(mat2)))
#multiply
mat6<-t(mat5) %*% mat4
#multiply
sqrt(mat6 %*% mat5)
Run Code Online (Sandbox Code Playgroud)
我认为该函数mahalanobis()用于计算一个矩阵中个体(行)之间的马哈拉诺比斯距离.该函数pairwise.mahalanobis()从package(HDMD)可以比较两个或更多个矩阵和给矩阵之间马哈拉诺比斯距离.
取平方根之前的输出是:
inversepooledcov %*% t(meandiffmatrix) %*% meandiffmatrix
[,1] [,2]
x -0.004349227 -0.01304768
y 0.114529639 0.34358892
Run Code Online (Sandbox Code Playgroud)
我认为你可以取负数的平方根,所以你有NAN负元素:
sqrt(inversepooledcov %*% t(meandiffmatrix) %*% meandiffmatrix)
[,1] [,2]
x NaN NaN
y 0.3384223 0.5861646
Warning message:
In sqrt(inversepooledcov %*% t(meandiffmatrix) %*% meandiffmatrix) :
NaNs produced
Run Code Online (Sandbox Code Playgroud)