如何基于Python中的注释块拆分文本文件？

Question

如何基于Python中的注释块拆分文本文件？

我早上的大部分时间都没有解决这个简单的问题.使用python,我想解析如下所示的数据文件:

# This is an example comment line, it starts with a '#' character.
# There can be a variable number of comments between each data set.
# Comments "go with" the data set that comes after them.
# The first data set starts on the next line:
0.0 1.0
1.0 2.0
2.0 3.0
3.0 4.0

# Data sets are followed by variable amounts of white space.
# The second data set starts after this comment
5.0 6.0
6.0 7.0


# One more data set.
7.0 8.0
8.0 9.0

Run Code Online (Sandbox Code Playgroud)

我想要的python代码将上面的例子解析为三个"块",将它们存储为列表的元素.各个代码块本身可以存储为行列表,有或没有注释行,无论如何.手动方式是这样做:

#! /usr/bin/env python

# Read in data, seperate into rows_alldata
f=open("example")
rows = f.read().split('\n')
f.close()

# Do you haz teh codez?
datasets=[]
datasets.append(rows[0:8])
datasets.append(rows[9:13])
datasets.append(rows[15:18])

Run Code Online (Sandbox Code Playgroud)

我正在寻找一种支持可变数量和长度的数据集的更通用的解决方案.我已经尝试了几个非pythonic外观循环的灾难.我认为最好不要与他们混淆我的问题; 这是工作而不是"家庭作业".

Answer 1

Fre*_*Foo 5

使用groupby.

from itertools import groupby

def contains_data(ln):
    # just an example; there are smarter ways to do this
    return ln[0] not in "#\n"

with open("example") as f:
    datasets = [[ln.split() for ln in group]
                for has_data, group in groupby(f, contains_data)
                if has_data]

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，4 月前
查看次数：	437 次
最近记录：	10 年，2 月前