Stu*_*nce 7 r matrix hierarchy
我希望能够通过分层集群"遍历"(迭代)(参见下图和代码).我想要的是:
采用矩阵和最小高度的函数.在这个例子中说10.
splitme <- function(matrix, minH){
##Some code
}
Run Code Online (Sandbox Code Playgroud)从顶部minH
开始,每当有新的分割时开始切割.这是第一个问题.如何检测新的分裂以获得高度h
.
在这个特殊情况下h
,有多少个集群?检索群集
mycl <- cutree(hr, h=x);#x is that found h
count <- count(mycl)# Bad code
Run Code Online (Sandbox Code Playgroud)保存每个新矩阵的变量.这是另一个艰难的,x new矩阵的动态创建.所以也许一个带集群的函数做了需要做的事情(比较)并返回一个变量??
继续3和4直到minH
达到
# Generate data
set.seed(12345)
desc.1 <- c(rnorm(10, 0, 1), rnorm(20, 10, 4))
desc.2 <- c(rnorm(5, 20, .5), rnorm(5, 5, 1.5), rnorm(20, 10, 2))
desc.3 <- c(rnorm(10, 3, .1), rnorm(15, 6, .2), rnorm(5, 5, .3))
data <- cbind(desc.1, desc.2, desc.3)
# Create dendrogram
d <- dist(data)
hc <- as.dendrogram(hclust(d))
# Function to color branches
colbranches <- function(n, col)
{
a <- attributes(n) # Find the attributes of current node
# Color edges with requested color
attr(n, "edgePar") <- c(a$edgePar, list(col=col, lwd=2))
n # Don't forget to return the node!
}
# Color the first sub-branch of the first branch in red,
# the second sub-branch in orange and the second branch in blue
hc[[1]][[1]] = dendrapply(hc[[1]][[1]], colbranches, "red")
hc[[1]][[2]] = dendrapply(hc[[1]][[2]], colbranches, "orange")
hc[[2]] = dendrapply(hc[[2]], colbranches, "blue")
# Plot
plot(hc)
Run Code Online (Sandbox Code Playgroud)
我认为您本质上需要的是树状图的共表相关系数。它会告诉您所有分裂点的高度。从那里您可以轻松地穿过树。我在下面进行了尝试,并将所有子矩阵存储到“子矩阵”列表中。这是一个嵌套列表。第一层是所有分裂点的子矩阵。第二层是来自分裂点的子矩阵。例如,如果您想要第一个分割点(灰色和蓝色簇)的所有子矩阵,则它应该是 submatrices[[1]]。如果您想要 submatrices[[1]] 中的第一个子矩阵(红色簇),则它应该是 submatrices[[1]][1]。
splitme <- function(data, minH){
##Compute dist matrix and clustering dendrogram
d <- dist(data)
cl <- hclust(d)
hc <- as.dendrogram(cl)
##Get the cophenetic correlation coefficient matrix (cccm)
cccm <- round(cophenetic(hc), digits = 0)
#Get the heights of spliting points (sps)
sps <- sort(unique(cccm), decreasing = T)
#This list store all the submatrices
#The submatrices extract from the nth splitting points
#(top splitting point being the 1st whereas bottom splitting point being the last)
submatrices <- list()
#Iterate/Walk the dendrogram
i <- 2 #Starting from 2 as the 1st value will give you the entire dendrogram as a whole
while(sps[i] > minH){
membership <- cutree(cl, h=sps[i]) #Cut the tree at splitting points
lst <- list() #Create a list to store submatrices extract from a splitting point
for(j in 1:max(membership)){
member <- which(membership == j) #Get the corresponding data entry to create the submatrices
df <- data.frame()
for(p in member){
df <- rbind(df, data[p, ])
colnames(df) <- colnames(data)
dm <- dist(df)
}
lst <- append(lst, list(dm)) #Append all submatrices from a splitting point to lst
}
submatrices <- append(submatrices, list(lst)) #Append the lst to submatrices list
i <- i + 1
}
return(submatrices)
}
Run Code Online (Sandbox Code Playgroud)