是否可以将字符串变量而不是文件传递给 BLAST 搜索？

Question

是否可以将字符串变量而不是文件传递给 BLAST 搜索？

You*_*ark 5 bioinformatics fasta biopython blast

我正在编写一个 python 脚本，并且希望将查询序列信息作为字符串变量而不是 FASTA 格式文件传递到blastn（如果可能）。

我使用 Biopython 的 SeqIO 将多个转录本名称存储为键，并将其序列存储为关联值。

所以它看起来像这样

transcripts = dict()
for record in SeqIO.parse("transcript_sequences.fasta", "fasta"):
transcripts[record.name] = record.seq

print transcripts

Run Code Online (Sandbox Code Playgroud)

所以字典看起来像这样

{'var_F': Seq('CTTCATTCTCGTTTAGCGGCTGCTCGTGGAAATTTCGAAAAAATCTGAAACTAG...TGC', SingleLetterAlphabet())}

Run Code Online (Sandbox Code Playgroud)

现在我想将字典中的序列信息解析为爆炸查询和主题。

subprocess.call("blastn -query " + transcript['var_F'] + ".txt" + " -subject " + transcript['var_B'] + " -outfmt 6 > tmp_blast.txt", shell=True)

Run Code Online (Sandbox Code Playgroud)

我知道blast只接受文件名或字符串作为文件位置，但我只是想知道是否有解决方法。

预先感谢您阅读我的第一个问题：P

Answer 1

Vin*_*nce 4

来自 BLAST 的 BioPython 模块：

\n\n

http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc90

\n\n

看来您是正确的，biopython BLAST 不支持 SeqIO 对象或生物序列作为 BLAST 函数调用的参数，或者您使用subprocess.call()BLAST 二进制文件执行。唯一接受的输入序列参数是文件名。从教程中：

\n\n

>>> from Bio.Blast.Applications import NcbiblastxCommandline\n>>> help(NcbiblastxCommandline)\n...\n>>> blastx_cline = NcbiblastxCommandline(query="opuntia.fasta", db="nr", evalue=0.001,\n...                                      outfmt=5, out="opuntia.xml")\n>>> blastx_cline\nNcbiblastxCommandline(cmd=\'blastx\', out=\'opuntia.xml\', outfmt=5, query=\'opuntia.fasta\',\ndb=\'nr\', evalue=0.001)\n>>> print(blastx_cline)\nblastx -out opuntia.xml -outfmt 5 -query opuntia.fasta -db nr -evalue 0.001\n>>> stdout, stderr = blastx_cline()\n

Run Code Online (Sandbox Code Playgroud)\n\n

因此，您唯一的选择是使用实际的 FASTA 文件作为输入。如果您想一次查询一个序列，则需要将每个序列保存到一个文件中。但是，我建议不要这样做，除非您有理由这样做。我认为如果所有查询序列都在同一个文件中，BLAST可能会执行得更快。您还可以使用 BioPython 读取生成的 BLAST 输出，以迭代每个查询的结果，请参阅：

\n\n

http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc92

\n\n

示例取自上面的链接：

\n\n

如果您以其他方式运行 BLAST，并将 BLAST 输出（XML 格式）保存在文件 my_blast.xml 中，您所需要做的就是打开该文件进行读取：

\n\n

>>> result_handle = open("my_blast.xml")\n>>> from Bio.Blast import NCBIXML\n>>> blast_record = NCBIXML.read(result_handle)\n

Run Code Online (Sandbox Code Playgroud)\n\n

或者，如果您有很多结果（即多个查询序列）：

\n\n

>>> from Bio.Blast import NCBIXML\n>>> blast_records = NCBIXML.parse(result_handle)\n

Run Code Online (Sandbox Code Playgroud)\n\n

就像 Bio.SeqIO 和 Bio.AlignIO （参见第 5 章和第 6 章）一样，我们有一对输入函数，read 和 parse，其中 read 是在你只有一个对象时使用的，而 parse 是一个迭代器，在你可以有很多对象 \xe2\x80\x93 但我们得到的是 BLAST 记录对象，而不是 SeqRecord 或 MultipleSeqAlignment 对象。

\n\n

为了能够处理 BLAST 文件可能很大、包含数千个结果的情况，NCBIXML.parse() 返回一个迭代器。用简单的英语来说，迭代器允许您单步执行 BLAST 输出，为每个 BLAST 搜索结果逐一检索 BLAST 记录：

\n\n

>>> from Bio.Blast import NCBIXML\n>>> blast_records = NCBIXML.parse(result_handle)\n>>> blast_record = next(blast_records)\n# ... do something with blast_record\n>>> blast_record = next(blast_records)\n# ... do something with blast_record\n>>> blast_record = next(blast_records)\n# ... do something with blast_record\n>>> blast_record = next(blast_records)\nTraceback (most recent call last):\n  File "<stdin>", line 1, in <module>\nStopIteration\n# No further records\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	9 年，4 月前
查看次数：	1682 次
最近记录：	8 年，1 月前