如何在热图中扩展树形图

pdu*_*ois 4 plot r cluster-analysis heatmap

我有这样的树形图热像图.

完整的数据在这里.

问题是左边的树状图被压扁了.如何在不改变热图的列大小的情况下取消(展开)它?

在此输入图像描述

它是使用以下代码生成的:

#!/usr/bin/Rscript
library(gplots);
library(RColorBrewer);


plot_hclust  <- function(inputfile,clust.height,type.order=c(),row.margins=70) {

    # Read data
    dat.bcd <- read.table(inputfile,na.strings=NA, sep="\t",header=TRUE);


    rownames(dat.bcd) <- do.call(paste,c(dat.bcd[c("Probes","Gene.symbol")],sep=" "))
    dat.bcd <- dat.bcd[,!names(dat.bcd) %in% c("Probes","Gene.symbol")] 
    dat.bcd <- dat.bcd

    # Clustering and distance function
    hclustfunc <- function(x) hclust(x, method="complete")
    distfunc <- function(x) dist(x,method="maximum")


    # Select based on FC, as long as any of them >= anylim

    anylim <- 2.0
    dat.bcd <- dat.bcd[ apply(dat.bcd, 1,function(x) any (x >= anylim)), ]


    # Clustering functions
    height <- clust.height; 

    # Define output file name
    heatout <- paste("tmp.pafc.heat.",anylim,".h",height,".pdf",sep="");


    # Compute distance and clusteirn function
    d.bcd <- distfunc(dat.bcd)
    fit.bcd <- hclustfunc(d.bcd)


    # Cluster by height
    #cutree and rect.huclust has to be used in tandem
    clusters <- cutree(fit.bcd, h=height) 
    nofclust.height <-  length(unique(as.vector(clusters)));

    myorder <- colnames(dat.bcd); 
    if (length(type.order)>0) {
     myorder <- type.order
    }

    # Define colors
    #hmcols <- rev(brewer.pal(11,"Spectral"));
    hmcols <- rev(redgreen(2750));
    selcol <- colorRampPalette(brewer.pal(12,"Set3"))
    selcol2 <- colorRampPalette(brewer.pal(9,"Set1"))
    sdcol= selcol(5);
    clustcol.height = selcol2(nofclust.height);

    # Plot heatmap
    pdf(file=heatout,width=20,height=50); # for FC.lim >=2
    heatmap.2(as.matrix(dat.bcd[,myorder]),Colv=FALSE,density.info="none",lhei=c(0.1,4),dendrogram="row",scale="row",RowSideColors=clustcol.height[clusters],col=hmcols,trace="none", margin=c(30,row.margins), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0),keysize=0.3);
    dev.off();


}
#--------------------------------------------------
# ENd of functions 
#-------------------------------------------------- 

plot_hclust("http://pastebin.com/raw.php?i=ZaGkPTGm",clust.height=3,row.margins=70);
Run Code Online (Sandbox Code Playgroud)

TWL*_*TWL 8

在您的情况下,数据具有长尾,这对于基因表达数据(对数正态)是预期的.

data <- read.table(file='http://pastebin.com/raw.php?i=ZaGkPTGm', 
                   header=TRUE, row.names=1)

mat <- as.matrix(data[,-1]) # -1 removes the first column containing gene symbols
Run Code Online (Sandbox Code Playgroud)

从分位数分布可以看出,具有最高表达的基因扩展范围从1.5到300以上.

quantile(mat)

#     0%     25%     50%     75%    100% 
#  0.000   0.769   1.079   1.544 346.230 
Run Code Online (Sandbox Code Playgroud)

当对未缩放的数据执行层次聚类时,得到的树形图可能会显示具有最高表达式的值的偏差,如示例中所示.这在许多(参考)中都应该是对数或z分数变换.您的数据集包含values == 0,这是日志转换的问题,因为log(0)未定义.

Z-score变换(参考)在其中实现heatmap.2,但重要的是要注意该函数计算距离矩阵并在缩放数据之前运行聚类算法.因此,该选项scale='row'不会影响聚类结果,请参阅我之前的帖子(R中热图/聚类默认值的差异)以获取更多详细信息.

我建议您运行之前扩展数据heatmap.2:

# scale function transforms columns by default hence the need for transposition.
z <- t(scale(t(mat))) 

quantile(z)

#         0%        25%        50%        75%       100% 
# -2.1843994 -0.6646909 -0.2239677  0.3440102  2.2640027 

# set custom distance and clustering functions
hclustfunc <- function(x) hclust(x, method="complete")
distfunc <- function(x) dist(x,method="maximum")

# obtain the clusters
fit <- hclustfunc(distfunc(z))
clusters <- cutree(fit, 5) 

# require(gplots)
pdf(file='heatmap.pdf', height=50, width=10)
heatmap.2(z, trace='none', dendrogram='row', Colv=F, scale='none', 
             hclust=hclustfunc, distfun=distfunc, col=greenred(256), symbreak=T,
             margins=c(10,20), keysize=0.5, labRow=data$Gene.symbol,
             lwid=c(1,0.05,1), lhei=c(0.03,1), lmat=rbind(c(5,0,4),c(3,1,2)),
             RowSideColors=as.character(clusters))
dev.off()
Run Code Online (Sandbox Code Playgroud)

另外,查看其他职位在这里这里,解释如何设置通过热图的布局lmat,lwidlhei参数.

生成的热图如下所示(省略了行标签和列标签):

在此输入图像描述

  • @pdubois,不客气.是的,`scale`适用于z-score变换.请参阅[在R中创建z-scores]的这篇文章(http://stackoverflow.com/questions/6148050/creating-z-scores).注意它在输入矩阵上的用法:**基于行的**变换`t(scale(t(mat)))`或**基于列的**变换`scale(mat)`.关于[使用z-scores]的这篇文章(http://stats.stackexchange.com/questions/36076/is-a-heat-map-of-gene-expression-more-informative-if-z-scores -are-used-instead-o)可视化基因表达的变化. (2认同)