-3 python barcode bioinformatics
我正在为python类的介绍工作,我在编写脚本来读取文件时遇到了很多麻烦,然后在文件的序列开头识别条形码.
这就是我要打开我的文件:
#!/usr/bin/python
import sys
fname = sys.argv[1]
handle = open(fname , "r")
# read the file #
for line in handle:
print line.strip()
handle.close()
Run Code Online (Sandbox Code Playgroud)
它完美地打开我的文件并将内容打印到屏幕上.
我有的问题是添加到此以完成作业我收到错误消息,我不知道我做错了什么.
我将不胜感激任何帮助或建议.
作业和正确的预期结果详细说明:
创建一个名为〜/ assignments/assignment07/assignment07.py的可执行文件
python脚本应该采用2个命令行参数(按顺序):
(1)DNA条形码(2)含有DNA序列的文件的名称
您的脚本应该打印序列文件中与序列开头的给定条形码匹配的所有DNA序列,但丢弃条形码.不要打印条形码,只打印与条形码匹配的序列,并且不匹配不在序列前面的条形码.
#!/usr/bin/python
import sys
barcode = sys.argv[1]
filename = sys.argv[2]
bclen = len(bacode)
handle = open(fname, "r")
# read the file #
for line in handle:
print line.strip()
for line in filename:
bc = line[4:][:bclen]
seq = line[4:19][bclen:]
if bc == barcode:
seqslice = sequence[4:]
#print "barcode %s is at beginning of sequence %s" % (barcode, seqslice)
handle.close()
Run Code Online (Sandbox Code Playgroud)
这个脚本充满了一些常见的开始错误(不匹配的变量名称,不理解切片的使用),但这里有一个更正的版本,注释应该有所帮助:
与之合作 python script_name.py 123barcode filename.csv
#!/usr/bin/python
import sys
barcode = sys.argv[1]
filename = sys.argv[2]
bclen = len(barcode) #fixed typo so from bacode
handle = open(filename, "r") #changed from fname
# read the file #
## Combined for loops, no reason for double loop here
for line in handle:
print line.strip()
bc = line[:bclen] #changed to just slice of beginning to barcode length
seq = line[bclen:] #from end of barcode to end (only want 19 just add)
print "BC = " + bc #Added these print statements: when problems occur
print "SEQ = " + seq # always look to see what variable actually contain
#I don't know what you wanted here but this prints the matching sequence
if bc == barcode:
print "barcode %s is at beginning of sequence %s" % (barcode, seq)
handle.close()
Run Code Online (Sandbox Code Playgroud)