标签: biopython

Biopython本地BLAST数据库错误

我试图使用Biopython的NcbiblastxCommandline工具在"nr"数据库本地运行blastx但是我总是得到关于蛋白质数据库搜索路径的以下错误:

>>> from Bio.Blast.Applications import NcbiblastxCommandline
>>> nr = "/Users/Priya/Documents/Python/ncbi-blast-2.2.26+/bin/nr.pal"
>>> infile = "/Users/Priya/Documents/Python/Tutorials/opuntia.txt"
>>> blastx = "/Users/Priya/Documents/Python/ncbi-blast-2.2.26+/bin/blastx"
>>> outfile = "/Users/Priya/Documents/Python/Tutorials/opuntia_python_local.xml"
>>> blastx_cline = NcbiblastxCommandline(blastx, query = infile, db = nr, evalue = 0.001, out = outfile)
>>> stdout, stderr = blastx_cline()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File     "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Application/__init__.py", line 443, in __call__
stdout_str, stderr_str)
Bio.Application.ApplicationError: Command '/Users/Priya/Documents/Python/ncbi-blast-2.2.26+/bin/blastx -out /Users/Priya/Documents/Python/Tutorials/opuntia_python_local.xml -query /Users/Priya/Documents/Python/Tutorials/opuntia.txt -db /Users/Priya/Documents/Python/ncbi-blast-2.2.26+/bin/nr.pal -evalue 0.001' returned non-zero exit status 2, 'BLAST Database …

Run Code Online (Sandbox Code Playgroud)

python database path biopython blast

pri*_*hah

2012 05-21

3
推荐指数

1
解决办法

2460
查看次数

使用biopython从entrez获取基因序列

这就是我想做的。我有一个基因名称列表，例如：[ITGB1、RELA、NFKBIA]

查找biopython中的帮助和entrez的API教程我想出了这个：

x = ['ITGB1', 'RELA', 'NFKBIA']
for item in x:
    handle = Entrez.efetch(db="nucleotide", id=item ,rettype="gb")
    record = handle.read()
    out_handle = open('genes/'+item+'.xml', 'w') #to create a file with gene name
    out_handle.write(record)
    out_handle.close

Run Code Online (Sandbox Code Playgroud)

但这一直出错。我发现如果 id 是数字 id（尽管您必须将其放入字符串中才能使用，“186972394”），那么：

handle = Entrez.efetch(db="nucleotide", id='186972394' ,rettype="gb")

Run Code Online (Sandbox Code Playgroud)

这让我得到了我想要的信息，其中包括序列。

现在的问题是： 我如何搜索基因名称（因为我没有 ID 号）或轻松地将我的基因名称转换为 id 以获取我拥有的基因列表的序列。

谢谢你，

python biopython ncbi

Stu*_*nce

lucky-day

3
推荐指数

1
解决办法

6750
查看次数

使用NcbiblastxCommandline自定义blast db

这是我第一次在biopython中使用blast,我遇到了问题.

我从fasta文件创建了一个自定义blast数据库,其中包含20个序列,使用:

os.system('makeblastdb -in newtest.fasta -dbtype nucl -out newtest.db')

这确实在我当前工作的当前目录中生成了一些文件(newtest.db.nhr,newtest.db.nin,newtest.db.nsq):( /home/User/Documents/python/fasta-files)

现在我正在尝试使用以下方法在biopython中查询此数据库:

blastx_cline = NcbiblastxCommandline(query="queryfile.fas", db="newtest.db", evalue=0.00000001, outfmt=5, out="opuntia.xml")

Run Code Online (Sandbox Code Playgroud)

但是我收到了这个错误:

> Bio.Application.ApplicationError: Command 'blastx -out opuntia.xml
> -outfmt 5 -query queryfile.fas -db newtest.db -evalue 1e-08' returned non-zero exit status 2, 'BLAST Database error: No alias or
> index file found for protein database [newtest.db] in search path
> [/home/User/Documents/python/fasta-files:/usr/share/ncbi/blastdb:]'

Run Code Online (Sandbox Code Playgroud)

所以我尝试复制从生成的文件/home/User/Documents/python/fasta-files,/usr/share/ncbi/blastdb但它说我没有权限.

*编辑*

当我使用:os.system("blastn -db newtest.db -query "fastafile.fas" + " -out test.txt") 它正常生成一个输出文件.但不是相反**

所以我被困在这里,我不知道如何解决这个问题.

任何帮助,将不胜感激

python biopython blast

ifr*_*eak

2012 11-27

3
推荐指数

1
解决办法

1459
查看次数

The file is a FASTA file, with sequences in the single line format. That is, sequences are not broken up into multiple lines of a particular length, but instead the entire sequence occupies a single line.

Bio.SeqIO.write当然遵循格式建议,并每隔80个bps拆分序列.我可以写自己的作家来写那些"单行"的快速 - 但我的问题是,如果有一种方法,我错过了SeqIO这样做.

python bioinformatics fasta biopython python-2.7

Kor*_*rem

lucky-day

3
推荐指数

1
解决办法

1819
查看次数

Biopython:如何避免蛋白质的特定氨基酸序列,以便绘制Ramachandran图？

我写了一个python脚本来绘制泛素蛋白的'Ramachandran Plot'.我正在使用biopython.我正在使用pdb文件.我的脚本如下:

import Bio.PDB
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

phi_psi = ([0,0])
phi_psi = np.array(phi_psi)
pdb1 ='/home/devanandt/Documents/VMD/1UBQ.pdb'

for model in Bio.PDB.PDBParser().get_structure('1UBQ',pdb1) :
    for chain in model :
        polypeptides = Bio.PDB.PPBuilder().build_peptides(chain)
        for poly_index, poly in enumerate(polypeptides) :
            print "Model %s Chain %s" % (str(model.id), str(chain.id)),
            print "(part %i of %i)" % (poly_index+1, len(polypeptides)),
            print "length %i" % (len(poly)),
            print "from %s%i" % (poly[0].resname, poly[0].id[1]),
            print "to %s%i" % (poly[-1].resname, poly[-1].id[1])
            phi_psi = poly.get_phi_psi_list()
            for …

Run Code Online (Sandbox Code Playgroud)

python bioinformatics protein-database biopython

dex*_*dev

2016 08-17

3
推荐指数

1
解决办法

676
查看次数

将FASTA转换为GenBank

有没有办法使用BioPython将FASTA文件转换为Genbank格式？关于如何从Genbank转换为FASTA的答案有很多,但不是相反.

fasta biopython genbank

Ric*_* Su

lucky-day

3
推荐指数

1
解决办法

1336
查看次数

从fasta文件估计Biopython中的字母表

我正在寻找一种.fasta在Biopython中读取文件的方法,如果我们处理的是DNA,RNA或蛋白质,我们会对包进行估算.到目前为止,我读到这样的数据:

with open('file.fasta', 'r') as f:
    for seq in sio.parse(f, 'fasta'):
        # do stuff, depending on alphabet

Run Code Online (Sandbox Code Playgroud)

我现在的问题是我不知道我会在.fasta文件中找到什么样的序列.它可以是蛋白质,DNA或RNA,但我必须知道字母表中的字母数量.

有没有办法用Biopython从序列中"估计"字母表？我知道可能有一个蛋白质只包含字母ACGT,这就是为什么我想估计字母表.

python bioinformatics fasta biopython

rom*_*asy

2017 01-12

3
推荐指数

1
解决办法

206
查看次数

计算字符串Python3.6中子串实例的最快方法

我一直在研究一个程序,它需要在主字符串(~400,000个字符)内计算子字符串(最多4000个位于列表中的2-6个字符的子字符串).我理解这与字符串中的Counting子字符串中提出的问题类似,但是,此解决方案对我不起作用.由于我的子字符串是DNA序列,我的许多子字符串都是单个字符的重复实例(例如'AA'); 因此,如果我将字符串拆分为'AA','AAA'将被解释为'AA'的单个实例而不是两个实例.我目前的解决方案是使用嵌套循环,但我希望有一个更快的方法,因为这个代码需要5分钟以上的单个主字符串.提前致谢!

def getKmers(self, kmer):
    self.kmer_dict = {}
    kmer_tuples = list(product(['A', 'C', 'G', 'T'], repeat = kmer))
    kmer_list = []
    for x in range(len(kmer_tuples)):
        new_kmer = ''
        for y in range(kmer):
            new_kmer += kmer_tuples[x][y]
        kmer_list.append(new_kmer)
    for x in range(len(kmer_list)):
        self.kmer_dict[kmer_list[x]] = 0
    for x in range(len(self.sequence)-kmer):
        for substr in kmer_list:
            if self.sequence[x:x+kmer] == substr:
                self.kmer_dict[substr] += 1
                break
    return self.kmer_dict

Run Code Online (Sandbox Code Playgroud)

python string performance bioinformatics biopython

Dan*_*ann

2019 01-26

3
推荐指数

1
解决办法

258
查看次数

在 for 循环中直接调用 SeqIO.parse() 可以，但是事先单独使用它不行吗？为什么？

在 python 中，我直接调用函数 SeqIO.parse() 的代码运行良好：

from Bio import SeqIO
a = SeqIO.parse("a.fasta", "fasta")
records = list(a)

for asq in SeqIO.parse("a.fasta", "fasta"):
    print("Q")

Run Code Online (Sandbox Code Playgroud)

但是，我首先将 SeqIO.parse() 的输出存储在名为 a 的变量（？）中，然后尝试在我的循环中使用它，它不会运行：

from Bio import SeqIO
a = SeqIO.parse("a.fasta", "fasta")
records = list(a)

for asq in a:
    print("Q")

Run Code Online (Sandbox Code Playgroud)

这是因为函数 || 的输出吗？SeqIO.parse("a.fasta", "fasta") || 存储在“a”中与我直接调用它时不同吗？这里的“a”到底是什么身份。它是一个变量吗？它是一个物体吗？该函数实际上返回什么？

python bioinformatics fasta biopython

Abr*_*mad

2019 02-22

3
推荐指数

1
解决办法

873
查看次数

Biopython：有没有一种方法可以从 PDB 文件中提取特定链的氨基酸序列？

我想从一堆 PDB 文件中提取特定链的单字母氨基酸序列。

我可以使用 SeqIO.parse() 来做到这一点，但在我看来，它感觉很不Pythonic：

PDB_file_path = '/full/path/to/some/pdb' 

# Is there a 1-liner for this ?
query_seqres = SeqIO.parse(PDB_file_path, 'pdb-seqres')

for chain in query_seqres:
    if chain.id == query_chain_id:
        query_chain = chain.seq
#

Run Code Online (Sandbox Code Playgroud)

有没有更简洁、更清晰的方法来做到这一点？

python bioinformatics biopython

Gab*_*Cia

2020 01-20

3
推荐指数

1
解决办法

4484
查看次数