假设我有一个包含3个序列的fasta ...
ATTTTTGGA
AT
A
Run Code Online (Sandbox Code Playgroud)
我希望我的序列数据看起来像这样:
ATTTTTGGA
ATTNNNNNN
ANNNNNNNN
Run Code Online (Sandbox Code Playgroud)
是否有任何程序或脚本可以在合理的时间范围内完成此任务.我有成千上万的序列.谢谢!
我正在乱搞并尝试这个,文件最后空白,但这是我已经得到的.
import sys
from Bio import SeqIO
from Bio.Seq import Seq
in_file = open(sys.argv[1],'r')
sequences = SeqIO.parse(in_file, "fasta")
output_in_file = open("test.fasta", "w")
for record in sequences:
n = 150
record.seq = record.seq + ("N" * n)
seq = seq[:n]
output_in_file.close()
in_file.close()
Run Code Online (Sandbox Code Playgroud)