我有以下格式的一些数据:
data = """
[Data-0]
Data = BATCH
BatProtocol = DIAG-ST
BatCreate = 20010724
[Data-1]
Data = SAMP
SampNum = 357
SampLane = 1
[Data-2]
Data = SAMP
SampNum = 357
SampLane = 2
[Data-9]
Data = BATCH
BatProtocol = VCA
BatCreate = 20010725
[Data-10]
Data = SAMP
SampNum = 359
SampLane = 1
[Data-11]
Data = SAMP
SampNum = 359
SampLane = 2
"""
Run Code Online (Sandbox Code Playgroud)
结构是:
[Data-x] 其中x是数字Data =其次是BATCH或SAMPLE我正在尝试编写一个函数,为每个'批处理'生成一个列表.列表的第一项是包含该行的文本块Data = BATCH,列表中的以下项是包含该行的文本块Data = SAMP.我现在有
def get_batches(data):
textblocks = iter([txt for txt in data.split('\n\n') if txt.strip()])
batch = []
sample = next(textblocks)
while True:
if 'BATCH' in sample:
batch.append(sample)
sample = next(textblocks)
if 'BATCH' in sample:
yield batch
batch = []
else:
batch.append(sample)
Run Code Online (Sandbox Code Playgroud)
如果像这样调用:
batches = get_batches(data)
for batch in batches:
print batch
print '_' * 20
Run Code Online (Sandbox Code Playgroud)
但是,它只返回第一个'批次':
['[Data-0]\nData = BATCH\nBatProtocol = DIAG-ST\nBatCreate = 20010724',
'[Data-1]\nData = SAMP\nSampNum = 357\nSampLane = 1',
'[Data-2]\nData = SAMP\nSampNum = 357\nSampLane = 2']
____________________
Run Code Online (Sandbox Code Playgroud)
我的预期输出将是:
['[Data-0]\nData = BATCH\nBatProtocol = DIAG-ST\nBatCreate = 20010724',
'[Data-1]\nData = SAMP\nSampNum = 357\nSampLane = 1',
'[Data-2]\nData = SAMP\nSampNum = 357\nSampLane = 2']
____________________
['[Data-9]\nData = BATCH\nBatProtocol = VCA\nBatCreate = 20010725',
'[Data-10]\nData = SAMP\nSampNum = 359\nSampLane = 1',
'[Data-11]\nData = SAMP\nSampNum = 359\nSampLane = 2']
____________________
Run Code Online (Sandbox Code Playgroud)
我缺少什么或如何改进我的功能?
当您找到下一批的开头时,您只会产生批次,因此您将永远不会包含最后一批数据.要解决此问题,您需要在函数结束时使用以下内容:
if batch:
yield batch
Run Code Online (Sandbox Code Playgroud)
然而,这样做是行不通的.最终next(textblocks)循环内部将StopIteration在while循环执行后引发一个代码.这是一种只需对当前代码进行微小更改即可实现此功能的方法(请参阅下面的更好版本):
def get_batches(data):
textblocks = iter([txt for txt in data.split('\n\n') if txt.strip()])
batch = []
sample = next(textblocks)
while True:
if 'BATCH' in sample:
batch.append(sample)
try:
sample = next(textblocks)
except StopIteration:
break
if 'BATCH' in sample:
yield batch
batch = []
else:
batch.append(sample)
if batch:
yield batch
Run Code Online (Sandbox Code Playgroud)
我建议只textblocks用for循环来循环:
def get_batches(data):
textblocks = (txt for txt in data.split('\n\n') if txt.strip())
batch = []
for sample in textblocks:
if 'BATCH' in sample:
if batch:
yield batch
batch = []
batch.append(sample)
if batch:
yield batch
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
204 次 |
| 最近记录: |