Ric*_* Su 3 fasta biopython genbank
有没有办法使用BioPython将FASTA文件转换为Genbank格式?关于如何从Genbank转换为FASTA的答案有很多,但不是相反.
在转换之前,你必须将字母表作为序列(DNA或蛋白质)
from Bio import SeqIO
from Bio.Alphabet import generic_dna, generic_protein
input_handle = open("test.fasta", "rU")
output_handle = open("test.gb", "w")
sequences = list(SeqIO.parse(input_handle, "fasta"))
#asign generic_dna or generic_protein
for seq in sequences:
  seq.seq.alphabet = generic_dna
count = SeqIO.write(sequences, output_handle, "genbank")
output_handle.close()
input_handle.close()
print "Coverted %i records" % count
Run Code Online (Sandbox Code Playgroud)
输入:
>I28Q9A102FII8J rank=0668881 x=2144.0 y=1105.0 length=418 ACGTCATGAGAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGATGAA GCTCCAGCTTGCTGGGGTGGATTAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCTTGACTCTGGGAT AAGCGTTGGAAACGACGTCTAATACCGGATATGACGACCGATGGCATCATCTGGTTGTGGAAAGAATTTTGGTC AAGGATGGACTCGCGGCCTATCAGGTAGTTGGTGAGGTAATGGCTCACCAAGCCTACGACGGGTAGCCGGCCTG AGAGGGTGACCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCA CAATGGGCGAAAGCCTGATGCAGCAACGCCGCGTGAGGGATGACGGCC >I28Q9A102JMH72 rank=0320459 x=3829.0 y=3120.0 length=512 ACGTCATGAGAGTTTGATCCTGGCTCAGGATGAACGCTAGCGGCAGGCTTAACACATGCAAGTCGAGGGTAGAA ATAGCTTGCTATTTTGAGACCGGCGCACGGGTGCGTAACGCGTATGCAATCTGCCTTTTACAGGGGAATAGCCC AGAGAAATTTGGATTAATGCCCCATAGCGCTGCAGGGCGGCATCGCCGAGCAGCTAAAGTCACAACGGTAAAGA TGAGCATGCGTCCCATTAGCTAGTTGGTAAGGTAACGGCTTACCAAGGCGATGATGGGTAGGGTCCTGAGAGGG AGATCCCCCACACTGGTACTGAGACACGGACCAGACTCCTACGGGAGGCAGCAGTGAGGAATATTGGTCAATGG GCGCAAGCCTGAACCAGCCATGCCGCGTGCAGGATGAAGGCCTTCGGGTTGTAAACTGCTTTTGACGGAACGAA AAAGCT
你得到:
LOCUS       I28Q9A102FII8J           418 bp    DNA              UNK 01-JAN-1980
DEFINITION  I28Q9A102FII8J rank=0668881 x=2144.0 y=1105.0 length=418
ACCESSION   I28Q9A102FII8J
VERSION     I28Q9A102FII8J
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
ORIGIN
        1 acgtcatgag agtttgatca tggctcagga cgaacgctgg cggcgtgctt aacacatgca
       61 agtcgaacga tgaagctcca gcttgctggg gtggattagt ggcgaacggg tgagtaacac
      121 gtgagtaacc tgcccttgac tctgggataa gcgttggaaa cgacgtctaa taccggatat
      181 gacgaccgat ggcatcatct ggttgtggaa agaattttgg tcaaggatgg actcgcggcc
      241 tatcaggtag ttggtgaggt aatggctcac caagcctacg acgggtagcc ggcctgagag
      301 ggtgaccggc cacactggga ctgagacacg gcccagactc ctacgggagg cagcagtggg
      361 gaatattgca caatgggcga aagcctgatg cagcaacgcc gcgtgaggga tgacggcc
//
LOCUS       I28Q9A102JMH72           450 bp    DNA              UNK 01-JAN-1980
DEFINITION  I28Q9A102JMH72 rank=0320459 x=3829.0 y=3120.0 length=512
ACCESSION   I28Q9A102JMH72
VERSION     I28Q9A102JMH72
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
ORIGIN
        1 acgtcatgag agtttgatcc tggctcagga tgaacgctag cggcaggctt aacacatgca
       61 agtcgagggt agaaatagct tgctattttg agaccggcgc acgggtgcgt aacgcgtatg
      121 caatctgcct tttacagggg aatagcccag agaaatttgg attaatgccc catagcgctg
      181 cagggcggca tcgccgagca gctaaagtca caacggtaaa gatgagcatg cgtcccatta
      241 gctagttggt aaggtaacgg cttaccaagg cgatgatggg tagggtcctg agagggagat
      301 cccccacact ggtactgaga cacggaccag actcctacgg gaggcagcag tgaggaatat
      361 tggtcaatgg gcgcaagcct gaaccagcca tgccgcgtgc aggatgaagg ccttcgggtt
      421 gtaaactgct tttgacggaa cgaaaaagct
//
        |   归档时间:  |  
           
  |  
        
|   查看次数:  |  
           1336 次  |  
        
|   最近记录:  |