使用Python来操作键值分组的txt文件表示

ale*_*hli 1 python key-value text-files

我试图使用Python来操作格式A中的文本文件:

Key1  
Key1value1  
Key1value2  
Key1value3  
Key2  
Key2value1  
Key2value2  
Key2value3  
Key3... 
Run Code Online (Sandbox Code Playgroud)

进入格式B:

Key1 Key1value1  
Key1 Key1value2  
Key1 Key1value3  
Key2 Key2value1  
Key2 Key2value2  
Key2 Key2value3  
Key3 Key3value1...
Run Code Online (Sandbox Code Playgroud)

具体来说,这里简要介绍一下文件本身(只显示一个密钥,完整文件中还有数千个密钥):

chr22:16287243: PASS  
patientID1  G/G  
patientID2  G/G  
patient ID3 G/G
Run Code Online (Sandbox Code Playgroud)

这里有所需的输出:

chr22:16287243: PASS  patientID1    G/G  
chr22:16287243: PASS  patientID2    G/G  
chr22:16287243: PASS  patientID3    G/G
Run Code Online (Sandbox Code Playgroud)

我编写了以下可以检测/显示密钥的代码,但是我在编写代码时难以存储与每个密钥关联的值,并随后打印这些键值对.任何人都可以帮我完成这项任务吗?

import sys
import re

records=[]

with open('filepath', 'r') as infile:
    for line in infile:
        variant = re.search("\Achr\d",line, re.I) # all variants start with "chr"
        if variant:
            records.append(line.replace("\n",""))
            #parse lines until a new variant is encountered

for r in records:
    print (r)
Run Code Online (Sandbox Code Playgroud)

Sve*_*ach 5

一次性完成,不存储行:

with open("input") as infile, open("ouptut", "w") as outfile:
    for line in infile:
        if line.startswith("chr"):
            key = line.strip()
        else:
            print >> outfile, key, line.rstrip("\n")
Run Code Online (Sandbox Code Playgroud)

此代码假定第一行包含密钥,否则将失败.