将计数器增加为循环内的字典值

cca*_*roo 3 python loops

我有一个名为aa_seq的几百个氨基酸序列表,它看起来像这样:['AFYIVHPMFSELINFQNEGHECQCQCG','KVHSLPGMSDNGSPAVLPKTEFNKYKI','RAQVEDLMSLSPHVENASIPKGSTPIP','TSTNNYPMVQEQAILSCIEQTMVADAK',...].每个序列长27个字母.我必须确定每个位置(1-27)最常用的氨基酸以及它的频率.

到目前为止,我有:

   count_dict = {} 
   counter = count_dict.values()
   aa_list = ['A', 'C', 'D', 'E' ,'F' ,'G' ,'H' ,'I' ,'K' ,'L' ,    #one-letter code for amino acids
       'M' ,'N' ,'P' ,'Q' ,'R' ,'S' ,'T' ,'V' ,'W' ,'Y']
   for p in range(0,26):                       #first round:looks at the first position in each sequence
        for s in range(0,len(aa_seq)):          #goes through all sequences of the list 
             for item in aa_list:                #and checks for the occurrence of each amino acid letter (=item)
                  if item in aa_seq[s][p]:
                      count_dict[item]            #if that letter occurs at the respective position, make it a key in the dictionary
                      counter += 1                #and increase its counter (the value, as definded above) by one 
    print count_dict
Run Code Online (Sandbox Code Playgroud)

它说KeyError:'A',它指向行count_dict [item].所以aa_list的项目显然不能以这种方式添加为关键字..?我怎么做?它还给出了一个错误"'int'对象不可迭代"关于计数器.如何增加柜台?

gre*_*ole 5

你可以使用Counter

>>> from collections import Counter

>>> l = ['AFYIVHPMFSELINFQNEGHECQCQCG', 'KVHSLPGMSDNGSPAVLPKTEFNKYKI', 'RAQVEDLMSLSPHVENASIPKGSTPIP', 'TSTNNYPMVQEQAILSCIEQTMVADAK']
>>> s = [Counter([l[j][i] for j in range(len(l))]).most_common()[0] for i in range(27)]
>>> s
[('A', 1),
 ('A', 1),
 ('Y', 1),
 ('I', 1),
 ('N', 1),
 ('Y', 1),
 ('P', 2),
 ('M', 4),
 ('S', 2),
 ('Q', 1),
 ('E', 2),
 ('Q', 1),
 ('I', 1),
 ('I', 1),
 ('A', 1),
 ('Q', 1),
 ('A', 1),
 ('I', 1),
 ('I', 1),
 ('Q', 1),
 ('E', 2),
 ('C', 1),
 ('Q', 1),
 ('A', 1),
 ('Q', 1),
 ('I', 1),
 ('I', 1)]
Run Code Online (Sandbox Code Playgroud)

但是,如果您拥有大型数据集,我可能会效率低下.

  • 只是因为在纯python中迭代列表很慢.当人们谈论"氨基酸序列"时,我立即想到千兆字节的数据.所以使用numpy数组或cython的东西可能是要走的路. (2认同)