我在做用的R包叫做分层聚类pvclust,其基础上hclust通过将引导来计算得到的集群显着性水平.
考虑以下具有3维和10个观察的数据集:
mat <- as.matrix(data.frame("A"=c(9000,2,238),"B"=c(10000,6,224),"C"=c(1001,3,259),
"D"=c(9580,94,51),"E"=c(9328,5,248),"F"=c(10000,100,50),
"G"=c(1020,2,240),"H"=c(1012,3,260),"I"=c(1012,3,260),
"J"=c(984,98,49)))
Run Code Online (Sandbox Code Playgroud)
当我hclust单独使用时,聚类对欧几里得测量和相关度量都运行良好:
# euclidean-based distance
dist1 <- dist(t(mat),method="euclidean")
mat.cl1 <- hclust(dist1,method="average")
# correlation-based distance
dist2 <- as.dist(1 - cor(mat))
mat.cl2 <- hclust(dist2, method="average")
Run Code Online (Sandbox Code Playgroud)
但是,在使用每个设置时pvclust,如下:
library(pvclust)
# euclidean-based distance
mat.pcl1 <- pvclust(mat, method.hclust="average", method.dist="euclidean", nboot=1000)
# correlation-based distance
mat.pcl2 <- pvclust(mat, method.hclust="average", method.dist="correlation", nboot=1000)
Run Code Online (Sandbox Code Playgroud)
...我收到以下错误:
Error in hclust(distance, method = method.hclust) :
must have n >= 2 objects to clusterError in cor(x, method …r cluster-analysis hierarchical-clustering hclust correlation
我有一个数据框如下:
> df <- data.frame("A"=rnorm(26), "B"=rnorm(26),row.names=sample(letters,26))
Run Code Online (Sandbox Code Playgroud)
然后我想B使用不同的行顺序将列作为向量取出
> newOrder <- sample(letters,26)
> vec <- df[newOrder,"B"] #1
Run Code Online (Sandbox Code Playgroud)
如何在#1 的单个语句中保留正确的行名df作为向量名vec?也就是说,无需执行以下操作:
> names(vec) <- newOrder
Run Code Online (Sandbox Code Playgroud)