使用ggplot2绘制"序列标识"?

Tal*_*ili 38 graphics r ggplot2

是否(合理地)使用ggplot2 绘制序列标识图

有一个包来做它基于"网格"称为" seqLogo ",但我想知道是否可能有一个ggplot2版本.

谢谢.

在此输入图像描述

Cur*_* F. 12

我提交的ggplot2尝试有点类似于上面的莱比锡/贝瑞解决方案.这种格式是有点接近标准的简写.

但我的解决方案,我认为任何ggplot2解决方案仍然不足,因为ggplot2无法控制绘图符号的宽高比.这是(我认为)生成序列标识所需的核心功能,而且缺少这些功能ggplot2.

另请注意:我使用了Jeremy Leipzig的答案中的数据,但我没有对小样本量或%GC值不同于50%进行任何校正.

require(ggplot2)
require(reshape2)

 freqs<-matrix(data=c(0.25,0.65,0.87,0.92,0.16,0.16,0.04,0.98,0.98,1.00,0.02,0.10,0.10,0.80,0.98,0.91,0.07,0.07,0.11,0.05,0.04,0.00,0.26,0.17,0.00,0.01,0.00,0.00,0.29,0.17,0.01,0.03,0.00,0.00,0.32,0.32,0.53,0.26,0.07,0.02,0.53,0.18,0.96,0.01,0.00,0.00,0.65,0.01,0.89,0.17,0.01,0.09,0.59,0.12,0.11,0.04,0.02,0.06,0.05,0.49,0.00,0.00,0.02,0.00,0.04,0.72,0.00,0.00,0.01,0.00,0.02,0.49),byrow=TRUE,nrow=4,dimnames=list(c('A','C','G','T')))

freqdf <- as.data.frame(t(freqs))

freqdf$pos = as.numeric(as.character(rownames(freqdf)))

freqdf$height <- apply(freqdf[,c('A', 'C','G','T')], MARGIN=1,
                       FUN=function(x){2-sum(log(x^x,base=2))})

logodf <- data.frame(A=freqdf$A*freqdf$height, C=freqdf$C*freqdf$height,
                     G=freqdf$G*freqdf$height, T=freqdf$T*freqdf$height, 
                     pos=freqdf$pos)

lmf <- melt(logodf, id.var='pos')

quartz(height=3, width=8)

ggplot(data=lmf, aes(x=as.numeric(as.character(pos)), y=value))  +
    geom_bar(aes(fill=variable,order=value), position='stack', 
        stat='identity', alpha=0.5) +
    geom_text(aes(label=variable, size=value, order=value, vjust=value),
        position='stack') +
    theme_bw()

quartz.save('StackOverflow_5438474.png', type='png')
Run Code Online (Sandbox Code Playgroud)

这产生了这个图:

不错,但不是一个序列标志情节


Jer*_*zig 11

我已经实现了Charles Berry设计的替代方案,它解决了下面评论部分讨论的seqLogos的一些弱点.它使用ggplot2:

library("devtools")
install_github("leipzig/berrylogo")
library("berrylogo")
freqs<-matrix(data=c(0.25,0.65,0.87,0.92,0.16,0.16,0.04,0.98,0.98,1.00,0.02,0.10,0.10,0.80,0.98,0.91,0.07,0.07,0.11,0.05,0.04,0.00,0.26,0.17,0.00,0.01,0.00,0.00,0.29,0.17,0.01,0.03,0.00,0.00,0.32,0.32,0.53,0.26,0.07,0.02,0.53,0.18,0.96,0.01,0.00,0.00,0.65,0.01,0.89,0.17,0.01,0.09,0.59,0.12,0.11,0.04,0.02,0.06,0.05,0.49,0.00,0.00,0.02,0.00,0.04,0.72,0.00,0.00,0.01,0.00,0.02,0.49),byrow=TRUE,nrow=4,dimnames=list(c('A','C','G','T')))
p<-berrylogo(freqs,gc_content=.41)
print(p)
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述


by0*_*by0 9

ggseqlogo应该是您正在寻找的.我希望这可以减轻我确信你们很多人在R中绘制序列标识时所遇到的一些挫折感