有没有更好的方法在 R 中分层聚类？

Question

有没有更好的方法在 R 中分层聚类？

Ale*_*lds 2 r cluster-analysis hierarchical rscript

我想先按行然后按列进行分层聚类。我想出了一个完整的解决方案：

#! /path/to/my/Rscript --vanilla
args <- commandArgs(TRUE)
mtxf.in <- args[1]
clusterMethod <- args[2]
mtxf.out <- args[3]

mtx <- read.table(mtxf.in, as.is=T, header=T, stringsAsFactors=T)

mtx.hc <- hclust(dist(mtx), method=clusterMethod)
mtx.clustered <- as.data.frame(mtx[mtx.hc$order,])
mtx.c.colnames <- colnames(mtx.clustered)
rownames(mtx.clustered) <- mtx.clustered$topLeftColumnHeaderName
mtx.clustered$topLeftColumnHeaderName <- NULL
mtx.c.t <- as.data.frame(t(mtx.clustered), row.names=names(mtx))
mtx.c.t.hc <- hclust(dist(mtx.c.t), method=clusterMethod)
mtx.c.t.c <- as.data.frame(mtx.c.t[mtx.c.t.hc$order,])
mtx.c.t.c.t <- as.data.frame(t(mtx.c.t.c))
mtx.c.t.c.t.colnames <- as.vector(names(mtx.c.t.c.t))
names(mtx.c.t.c.t) <- mtx.c.colnames[as.numeric(mtx.c.t.c.t.colnames) + 1]

write.table(mtx.c.t.c.t, file=mtxf.out, sep='\t', quote=F, row.names=T)

Run Code Online (Sandbox Code Playgroud)

变量mtxf.in和分别mtxf.out表示输入矩阵和聚类输出矩阵文件。变量clusterMethod是的一个hclust方法，如single，average等

作为示例输入，这是一个数据矩阵：

topLeftColumnHeaderName col1    col2    col3    col4    col5    col6
row1    0       3       0       0       0       3
row2    6       6       6       6       6       6
row3    0       3       0       0       0       3
row4    6       6       6       6       6       6
row5    0       3       0       0       0       3
row6    0       3       0       0       0       3

Run Code Online (Sandbox Code Playgroud)

运行此脚本，我从mtxf.in. 这是这个脚本的输出：

col5    col4    col1    col3    col2    col6
row6    0       0       0       0       3       3
row5    0       0       0       0       3       3
row1    0       0       0       0       3       3
row3    0       0       0       0       3       3
row2    6       6       6       6       6       6
row4    6       6       6       6       6       6

Run Code Online (Sandbox Code Playgroud)

我的问题：除了寻找一种方法来保留输入矩阵文件的原始结构之外，我也不知道这会消耗多少内存，或者是否有更快、更干净、更类似于“R”的方法来执行此操作.

难道真的这很难由R行和列集群？有没有建设性的方法来改进这个脚本？谢谢你的建议。

Answer 1

And*_*rie 5

一旦您清理了数据（即删除了第一列），这实际上只需要三行代码：

清理数据（从第一列分配行名称，然后删除第一列）：

dat <- mtfx.in
rownames(dat) <- dat[, 1]
dat <- dat[, -1]

Run Code Online (Sandbox Code Playgroud)

集群和重新排序：

row.order <- hclust(dist(dat))$order
col.order <- hclust(dist(t(dat)))$order

dat[row.order, col.order]

Run Code Online (Sandbox Code Playgroud)

结果：

     col5 col4 col1 col3 col2 col6
row6    0    0    0    0    3    3
row5    0    0    0    0    3    3
row1    0    0    0    0    3    3
row3    0    0    0    0    3    3
row2    6    6    6    6    6    6
row4    6    6    6    6    6    6

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，1 月前
查看次数：	4649 次
最近记录：	14 年，1 月前