biopython seqio.write() 中的“无效序列”错误

Hra*_*ant 6 biopython

这个问题与生物信息学有关。我在相应论坛没有收到任何建议,所以写在这里。

我需要删除 fasta 文件中的非 ACTG 核苷酸,并使用 biopython 中的 seqio 将输出写入新文件。

我的代码是

import re
import sys
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC


seq_list=[]
for seq_record in SeqIO.parse("test.fasta", "fasta",IUPAC.ambiguous_dna):
        sequence=seq_record.seq
        sequence=sequence.tomutable()
        seq_record.seq = re.sub('[^GATC]',"",str(sequence).upper())
        seq_list.append(seq_record)
SeqIO.write(seq_list,"test_out","fasta")
Run Code Online (Sandbox Code Playgroud)

运行此代码会出现错误:

Traceback (most recent call last):
  File "remove.py", line 18, in <module>
    SeqIO.write(list,"test_out","fasta")
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 481, in write
    count = writer_class(fp).write_file(sequences)
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages    /Bio/SeqIO/Interfaces.py", line 209, in write_file
    count = self.write_records(records)
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 194, in write_records
    self.write_record(record)
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/FastaIO.py", line 202, in write_record
    data = self._get_seq_string(record)  # Catches sequence being None
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 100, in _get_seq_string
% record.id)
 TypeError: SeqRecord (id=CALB_TCONS_00001015) has an invalid sequence.
Run Code Online (Sandbox Code Playgroud)

如果我改变这一行

seq_record.seq = re.sub('[^GATC]',"",str(sequence).upper())
Run Code Online (Sandbox Code Playgroud)

例如,seq_record.seq = sequence + "A"一切正常。然而,re.sub('[^GATC]',"",str(sequence).upper())理论上也应该有效。

谢谢

pet*_*rjc 6

Biopython 的 SeqIO 期望 SeqRecord 对象.seq是 Seq 对象(或类似对象),而不是纯字符串。尝试:

seq_record.seq = Seq(re.sub('[^GATC]',"",str(sequence).upper()))
Run Code Online (Sandbox Code Playgroud)

对于 FASTA 输出,无需设置序列的字母表。