如何在R中读取VCF文件

Question

如何在R中读取VCF文件

我有这个VCF格式的文件，我想在R中读取此文件。但是，此文件包含一些我想跳过的多余行。我想在行以匹配行开始的结果中得到类似的结果#CHROM。

这是我尝试过的：

chromo1<-try(scan(myfile.vcf,what=character(),n=5000,sep="\n",skip=0,fill=TRUE,na.strings="",quote="\"")) ## find the start of the vcf file
skip.lines<-grep("^#CHROM",chromo1)


column.labels<-read.delim(myfile.vcf,header=F,nrows=1,skip=(skip.lines-1),sep="\t",fill=TRUE,stringsAsFactors=FALSE,na.strings="",quote="\"")
num.vars<-dim(column.labels)[2]

Run Code Online (Sandbox Code Playgroud)

myfile.vcf

    #not wanted line
    #unnecessary line
    #junk line
    #CHROM  POS     ID      REF     ALT
    11      33443   3        A       T
    12      33445   5        A       G

Run Code Online (Sandbox Code Playgroud)

结果

    #CHROM  POS     ID      REF     ALT
    11      33443   3        A       T
    12      33445   5        A       G

Run Code Online (Sandbox Code Playgroud)

Answer 1

小智 6

也许这对您有好处：

# read two times the vcf file, first for the columns names, second for the data
tmp_vcf<-readLines("test.vcf")
tmp_vcf_data<-read.table("test.vcf", stringsAsFactors = FALSE)

# filter for the columns names
tmp_vcf<-tmp_vcf[-(grep("#CHROM",tmp_vcf)+1):-(length(tmp_vcf))]
vcf_names<-unlist(strsplit(tmp_vcf[length(tmp_vcf)],"\t"))
names(tmp_vcf_data)<-vcf_names

Run Code Online (Sandbox Code Playgroud)

ps：如果您有多个vcf文件，则应使用lapply函数。

最好，罗伯特

Answer 2

zx8*_*754 6

data.table::fread按预期读取它，请参见示例：

library(data.table)

#try this example vcf from GitHub
vcf <- fread("https://raw.githubusercontent.com/vcflib/vcflib/master/samples/sample.vcf")

#or if the file is local:
vcf <- fread("path/to/my/vcf/sample.vcf")

Run Code Online (Sandbox Code Playgroud)

我们还可以使用vcfR包，请参阅链接中的手册。

归档时间：	10 年，1 月前
查看次数：	5898 次
最近记录：	6 年，10 月前