我有一个以下格式的DNA文件:
>gi|5524211|gb|AAD44166.1| cytochrome
ACCAGAGCGGCACAGCAGCGACATCAGCACTAGCACTAGCATCAGCATCAGCATCAGC
CTACATCATCACAGCAGCATCAGCATCGACATCAGCATCAGCATCAGCATCGACGACT
ACACCCCCCCCGGTGTGTGTGGGGGGTTAAAAATGATGAGTGATGAGTGAGTTGTGTG
CTACATCATCACAGCAGCATCAGCATCGACATCAGCATCAGCATCAGCATCGACGACT
TTCTATCATCATTCGGCGGGGGGATATATTATAGCGCGCGATTATTGCGCAGTCTACG
TCATCGACTACGATCAGCATCAGCATCAGCATCAGCATCGACTAGCATCAGCTACGAC
Run Code Online (Sandbox Code Playgroud)
如何读取此文件并提取DNA序列部分(ACCAGAGCGG...)而不添加任何换行符,例如:
ACCAGAGCGGCACAGCAGCGACATCAGCACTAGCACTAGCATCAGCATCAGCATCAGCCTACATCATCACAGCAGCATCA
Run Code Online (Sandbox Code Playgroud)
也许不需要正则表达式?
如果总是只有一行标题:
dnalines = text.split('\n')[1:]
dna = ''.join(dnalines)
Run Code Online (Sandbox Code Playgroud)
使用text =文件的内容(例如text = open('yourfile').read())