生成器函数仅生成第一个项目

Bio*_*eek 2 python generator

我有以下格式的一些数据:

data = """

[Data-0]
Data = BATCH
BatProtocol = DIAG-ST
BatCreate = 20010724

[Data-1]
Data = SAMP
SampNum = 357
SampLane = 1

[Data-2]
Data = SAMP
SampNum = 357
SampLane = 2

[Data-9]
Data = BATCH
BatProtocol = VCA
BatCreate = 20010725

[Data-10]
Data = SAMP
SampNum = 359
SampLane = 1

[Data-11]
Data = SAMP
SampNum = 359
SampLane = 2

"""
Run Code Online (Sandbox Code Playgroud)

结构是:

  1. [Data-x] 其中x是数字
  2. Data =其次是BATCHSAMPLE
  3. 更多的线条

我正在尝试编写一个函数,为每个'批处理'生成一个列表.列表的第一项是包含该行的文本块Data = BATCH,列表中的以下项是包含该行的文本块Data = SAMP.我现在有

def get_batches(data):
    textblocks = iter([txt for txt in data.split('\n\n') if txt.strip()])
    batch = []
    sample = next(textblocks)
    while True:
        if 'BATCH' in sample:
            batch.append(sample)
        sample = next(textblocks)
        if 'BATCH' in sample:
            yield batch
            batch = []
        else:
            batch.append(sample)
Run Code Online (Sandbox Code Playgroud)

如果像这样调用:

batches = get_batches(data)
for batch in batches:
    print batch
    print '_' * 20
Run Code Online (Sandbox Code Playgroud)

但是,它只返回第一个'批次':

['[Data-0]\nData = BATCH\nBatProtocol = DIAG-ST\nBatCreate = 20010724', 
 '[Data-1]\nData = SAMP\nSampNum = 357\nSampLane = 1', 
 '[Data-2]\nData = SAMP\nSampNum = 357\nSampLane = 2']
____________________
Run Code Online (Sandbox Code Playgroud)

我的预期输出将是:

['[Data-0]\nData = BATCH\nBatProtocol = DIAG-ST\nBatCreate = 20010724', 
 '[Data-1]\nData = SAMP\nSampNum = 357\nSampLane = 1', 
 '[Data-2]\nData = SAMP\nSampNum = 357\nSampLane = 2']
____________________
['[Data-9]\nData = BATCH\nBatProtocol = VCA\nBatCreate = 20010725', 
 '[Data-10]\nData = SAMP\nSampNum = 359\nSampLane = 1', 
 '[Data-11]\nData = SAMP\nSampNum = 359\nSampLane = 2']
____________________
Run Code Online (Sandbox Code Playgroud)

我缺少什么或如何改进我的功能?

And*_*ark 6

当您找到下一批的开头时,您只会产生批次,因此您将永远不会包含最后一批数据.要解决此问题,您需要在函数结束时使用以下内容:

if batch:
    yield batch
Run Code Online (Sandbox Code Playgroud)

然而,这样做是行不通的.最终next(textblocks)循环内部将StopIterationwhile循环执行后引发一个代码.这是一种只需对当前代码进行微小更改即可实现此功能的方法(请参阅下面的更好版本):

def get_batches(data):
    textblocks = iter([txt for txt in data.split('\n\n') if txt.strip()])
    batch = []
    sample = next(textblocks)
    while True:
        if 'BATCH' in sample:
            batch.append(sample)
        try:
            sample = next(textblocks)
        except StopIteration:
            break
        if 'BATCH' in sample:
            yield batch
            batch = []
        else:
            batch.append(sample)
    if batch:
        yield batch
Run Code Online (Sandbox Code Playgroud)

我建议只textblocksfor循环来循环:

def get_batches(data):
    textblocks = (txt for txt in data.split('\n\n') if txt.strip())
    batch = []
    for sample in textblocks:
        if 'BATCH' in sample:
            if batch:
                yield batch
            batch = []
        batch.append(sample)
    if batch:
        yield batch
Run Code Online (Sandbox Code Playgroud)