我正在尝试创建遗传签名.我有一个充满DNA序列的文本文件.我想从文本文件中读取每一行.然后将4个4个碱基的4mer加入字典中.例如:样本序列
ATGATATATCTATCAT
我想要添加的是ATGA,TGAT,GATA等等.在添加4mers时ID增加1的字典中.
所以字典会举行......
Genetic signatures, ID
ATGA,1
TGAT, 2
GATA,3
Run Code Online (Sandbox Code Playgroud)
这是我到目前为止所拥有的......
import sys
def main ():
readingFile = open("signatures.txt", "r")
my_DNA=""
DNAseq = {} #creates dictionary
for char in readingFile:
my_DNA = my_DNA+char
for char in my_DNA:
index = 0
DnaID=1
seq = my_DNA[index:index+4]
if (DNAseq.has_key(seq)): #checks if the key is in the dictionary
index= index +1
else :
DNAseq[seq] = DnaID
index = index+1
DnaID= DnaID+1
readingFile.close()
if __name__ == '__main__':
main()
Run Code Online (Sandbox Code Playgroud)
这是我的输出:
ACTC
ACTC
ACTC
ACTC
ACTC …Run Code Online (Sandbox Code Playgroud) 我想计算列的总和,但排除一列.如何在添加每行的总和时指定要排除的列.
hd_total<-rowSums(hd) #hd is where the data is that is read is being held
hn_total<-rowSums(hn)
Run Code Online (Sandbox Code Playgroud)