Bap*_*Das 3 r cluster-analysis fuzzy-c-means
我正在使用 package.json 运行模糊 C 均值聚类e1071。我想根据以下公式给出的模糊性能指数(FPI)(模糊程度)和归一化分类熵(NCE)(特定类别的混乱程度)来确定最佳簇数
其中 c 是聚类数,n 是观测值数,\xce\xbc ik是模糊隶属度,log a是自然对数。
\n我正在使用以下代码
\nlibrary(e1071)\nx <- rbind(matrix(rnorm(100,sd=0.3),ncol=2),\n matrix(rnorm(100,mean=1,sd=0.3),ncol=2))\ncl <- cmeans(x,2,20,verbose=TRUE,method="cmeans")\ncl$membership\nRun Code Online (Sandbox Code Playgroud)\n我已经能够提取 \xce\xbc ik即模糊隶属度。现在,cmeans必须针对不同数量的簇(例如 2 到 6),并且必须计算 FPI 和 NCE 以获得如下图所示的图
在R中如何实现呢?
\n编辑
\niris我已使用以下代码尝试了@nya为数据集提供的代码
df <- scale(iris[-5])\n\nFPI <- function(cmem){\n c <- ncol(cmem)\n n <- nrow(cmem)\n \n 1 - (c / (c - 1)) * (1 - sum(cmem^2) / n)\n}\n\nNCE <- function(cmem){\n c <- ncol(cmem)\n n <- nrow(cmem)\n \n (n / (n - c)) * (- sum(cmem * log(cmem)) / n)\n}\n\n# prepare variables\ncl <- list()\nfpi <- nce <- NULL\n\n# cycle through the desired number of clusters\nfor(i in 2:6){\n cl[[i]] <- cmeans(df, i, 20, method = "cmeans")\n fpi <- c(fpi, FPI(cl[[i]]$membership))\n nce <- c(nce, NCE(cl[[i]]$membership))\n}\n\n# add space for the second axis label\npar(mar = c(5,4,1,4) + .1)\n\n# plot FPI\nplot(2:6, fpi, lty = 2, pch = 18, type = "b", xlab = "Number of clusters", ylab = "FPI")\n\n# plot NCE, manually adding the second axis\npar(new = TRUE)\nplot(2:6, nce, lty = 1, pch = 15, type = "b", xlab = "", ylab = "", axes = FALSE)\naxis(4, at = pretty(range(nce)))\nmtext("NCE", side = 4, line = 3)\n\n# add legend\nlegend("top", legend = c("FPI", "NCE"), pch = c(18,15), lty = c(2,1), horiz = TRUE)\nRun Code Online (Sandbox Code Playgroud)\n\n考虑模糊性能指数(FPI)和归一化分类熵(NCE)的最小值来确定最佳聚类数。NCE 始终在增加,而 FPI 则显示出递减值。理想情况下应该是
\n\n利用可用的方程,我们可以编写自己的函数。在这里,这两个函数使用您建议的论文和作者引用的参考文献之一中存在的方程。
FPI <- function(cmem, method = c("FuzME", "McBrathney", "Rahul")){
method = match.arg(method)
C <- ncol(cmem)
N <- nrow(cmem)
# Rahul et al. 2019. https://doi.org/10.1080/03650340.2019.1578345
if(method == "Rahul"){
res <- 1 - (C / (C - 1)) * (1 - sum(cmem^2) / N)
}
# McBrathney & Moore 1985 https://doi.org/10.1016/0168-1923(85)90082-6
if(method == "McBrathney"){
F <- sum(cmem^2) / N
res <- 1 - (C * F - 1) / (F - 1)
}
# FuzME https://precision-agriculture.sydney.edu.au/resources/software/
# MATLAB code file fvalidity.m, downloaded on 11 Nov, 2021
if(method == "FuzME"){
F <- sum(cmem^2) / N
res <- 1 - (C * F - 1) / (C - 1)
}
return(res)
}
NCE <- function(cmem, method = c("FuzME", "McBrathney", "Rahul")){
method = match.arg(method)
C <- ncol(cmem)
N <- nrow(cmem)
if(method == "Rahul"){
res <- (N / (N - C)) * (- sum(cmem * log(cmem)) / N)
}
if(method %in% c("FuzME", "McBrathney")){
H <- -1 / N * sum(cmem * log(cmem))
res <- H / log(C)
}
return(res)
}
Run Code Online (Sandbox Code Playgroud)
cmeans然后使用它们根据数据集中函数的隶属度计算索引iris。
# prepare variables
cl <- list()
fpi <- nce <- NULL
# cycle through the desired number of clusters
for(i in 2:6){
cl[[i]] <- e1071::cmeans(iris[, -5], i, 20, method = "cmeans")
fpi <- c(fpi, FPI(cl[[i]]$membership, method = "M"))
nce <- c(nce, NCE(cl[[i]]$membership, method = "M"))
}
Run Code Online (Sandbox Code Playgroud)
最后,在一个图中绘制两个不同的轴。
# add space for the second axis label
par(mar = c(5,4,1,4) + .1)
# plot FPI
plot(2:6, fpi, lty = 2, pch = 18, type = "b", xlab = "Number of clusters", ylab = "FPI")
# plot NCE, manually adding the second axis
par(new = TRUE)
plot(2:6, nce, lty = 1, pch = 15, type = "b", xlab = "", ylab = "", axes = FALSE)
axis(4, at = pretty(range(nce)))
mtext("NCE", side = 4, line = 3)
# add legend
legend("top", legend = c("FPI", "NCE"), pch = c(18,15), lty = c(2,1), horiz = TRUE)
Run Code Online (Sandbox Code Playgroud)
EDIT1:根据两个不同出版物中的可选方程更新了函数,并计算了iris数据集上的示例。
EDIT2:添加了此处提供的 FuzME MATLAB 代码中指定的 FPI 和 NCE 计算的代码。
| 归档时间: |
|
| 查看次数: |
479 次 |
| 最近记录: |