我有一个名为aa_seq的几百个氨基酸序列表,它看起来像这样:['AFYIVHPMFSELINFQNEGHECQCQCG','KVHSLPGMSDNGSPAVLPKTEFNKYKI','RAQVEDLMSLSPHVENASIPKGSTPIP','TSTNNYPMVQEQAILSCIEQTMVADAK',...].每个序列长27个字母.我必须确定每个位置(1-27)最常用的氨基酸以及它的频率.
到目前为止,我有:
count_dict = {}
counter = count_dict.values()
aa_list = ['A', 'C', 'D', 'E' ,'F' ,'G' ,'H' ,'I' ,'K' ,'L' , #one-letter code for amino acids
'M' ,'N' ,'P' ,'Q' ,'R' ,'S' ,'T' ,'V' ,'W' ,'Y']
for p in range(0,26): #first round:looks at the first position in each sequence
for s in range(0,len(aa_seq)): #goes through all sequences of the list
for item in aa_list: #and checks for the occurrence of each amino acid letter (=item)
if item in aa_seq[s][p]:
count_dict[item] …Run Code Online (Sandbox Code Playgroud)