Emi*_*raz 1 python sorting translation sequences python-3.x
我必须编写一个脚本来翻译这个序列:
dict = {"TTT":"F|Phe","TTC":"F|Phe","TTA":"L|Leu","TTG":"L|Leu","TCT":"S|Ser","TCC":"S|Ser",
"TCA":"S|Ser","TCG":"S|Ser", "TAT":"Y|Tyr","TAC":"Y|Tyr","TAA":"*|Stp","TAG":"*|Stp",
"TGT":"C|Cys","TGC":"C|Cys","TGA":"*|Stp","TGG":"W|Trp", "CTT":"L|Leu","CTC":"L|Leu",
"CTA":"L|Leu","CTG":"L|Leu","CCT":"P|Pro","CCC":"P|Pro","CCA":"P|Pro","CCG":"P|Pro",
"CAT":"H|His","CAC":"H|His","CAA":"Q|Gln","CAG":"Q|Gln","CGT":"R|Arg","CGC":"R|Arg",
"CGA":"R|Arg","CGG":"R|Arg", "ATT":"I|Ile","ATC":"I|Ile","ATA":"I|Ile","ATG":"M|Met",
"ACT":"T|Thr","ACC":"T|Thr","ACA":"T|Thr","ACG":"T|Thr", "AAT":"N|Asn","AAC":"N|Asn",
"AAA":"K|Lys","AAG":"K|Lys","AGT":"S|Ser","AGC":"S|Ser","AGA":"R|Arg","AGG":"R|Arg",
"GTT":"V|Val","GTC":"V|Val","GTA":"V|Val","GTG":"V|Val","GCT":"A|Ala","GCC":"A|Ala",
"GCA":"A|Ala","GCG":"A|Ala", "GAT":"D|Asp","GAC":"D|Asp","GAA":"E|Glu",
"GAG":"E|Glu","GGT":"G|Gly","GGC":"G|Gly","GGA":"G|Gly","GGG":"G|Gly"}
seq = "TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA"
a=""
for y in range( 0, len ( seq)):
c=(seq[y:y+3])
#print(c)
for k, v in dict.items():
if seq[y:y+3] == k:
alle_amino = v[::3] #alle aminozuren op rijtje, a1.1 -a2.1- a.3.1-a1.2 enzo
print (v)
Run Code Online (Sandbox Code Playgroud)
使用这个脚本我可以得到彼此相差3帧的氨基酸,但是我怎样才能对它进行排序,让第1帧中的所有氨基酸彼此相邻,并将第2帧中的所有氨基酸彼此相邻,第3帧是一样的吗?
例如,我的结果必须是:
+3 SerIleLeuAlaStpProLysTrpGluProProTyrValAlaStpProIleTyrIleTyrTle
+2 PheAsnThrSerMetThrLysValGlyThrProLeuArgSerMetThrHisIleTyrIleTyr
+1 PheGlnTyrStpHisAspGlnSerGlyAsnProLeuThrStpHisAspProTyrIleTyrIle
TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA
我使用Python 3.
我还有一个问题:我可以通过我自己的脚本中的一些变化来获得这个结果吗?
你可以使用(注意使用biopython翻译方法会更容易得多):
dictio = {your dictionary here}
def translate(seq):
x = 0
aaseq = []
while True:
try:
aaseq.append(dicti[seq[x:x+3]])
x += 3
except (IndexError, KeyError):
break
return aaseq
seq = "TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA"
for frame in range(3):
print('+%i' %(frame+1), ''.join(item.split('|')[1] for item in translate(seq[frame:])))
Run Code Online (Sandbox Code Playgroud)
注意我更改了字典的名称dicti(不要覆盖dict).
一些评论可以帮助您理解:
translate采取序列并以列表的形式返回它,其中每个项目对应于编码该位置的三联体的氨基酸翻译.喜欢:
aaseq = ["L|Leu","L|Leu","P|Pro", ....]
Run Code Online (Sandbox Code Playgroud)
你可以在里面处理更多的这些数据(只得到一个或三个字母代码)translate或者返回它,就像我要做的那样处理它.
translate 被称为
''.join(item.split('|')[1] for item in translate(seq[frame:]))
Run Code Online (Sandbox Code Playgroud)
对于每一帧.对于帧值为0,1或2,它发送seq [frame:]作为要翻译的参数.也就是说,您正在发送与三个不同阅读框相对应的序列,并将它们串行处理.然后,在
''.join(item.split('|')[1]
Run Code Online (Sandbox Code Playgroud)
我将每个氨基酸的一个和三个字母代码分开,并在索引1(第二个)处取一个.然后他们加入一个字符串
| 归档时间: |
|
| 查看次数: |
2167 次 |
| 最近记录: |