寻找基因间区域

c02*_*023 2 r bioinformatics bioconductor

我想提取染色体的基因间坐标。我编写了一大块代码,但由于我对这些包不熟悉,我不确定我是否遵循了正确的逻辑:

library(IRanges)
library(GenomicFeatures)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb = transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by = "gene",use.names=TRUE)

#For example, only I am interested in intergenic coordinates in chromosome 1
seqlevels(txdb, force=TRUE) = c("chr1")

#Creates IRanges 

ir = IRanges(start=unlist(start(txdb)), end=unlist(end(txdb)), names=names(txdb)) 

# Reduce overlapping gene positions to continuous range
ir.red = reduce(ir) # Collapses ranges of overlapping genes to continuous range.

#Identify overlaps among the initial and reduced range data sets
overlap = findOverlaps(ir, ir.red) 
Run Code Online (Sandbox Code Playgroud)

我需要处理“+”和“-”链吗?

Dev*_*yan 5

最简单的方法是从genes() 访问器开始,reduce() 并取补集:

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
genic <- genes(TxDb.Hsapiens.UCSC.hg19.knownGene)
genic <- reduce(genic, ignore.strand=T)
intergenic <- gaps(genic)
intergenic <- intergenic[strand(intergenic) == "*"] #This is important!!!
Run Code Online (Sandbox Code Playgroud)

最后一步非常重要,否则您将在每个染色体上获得额外的 2 个条目(+ 和 - 各一个)。