在python中读取文件的前N行

Question

在python中读取文件的前N行

我们有一个大的原始数据文件,我们想要修剪到指定的大小.我在.net c#中很有经验,但是想在python中做这件事来简化事情并且没有兴趣.

我如何在python中获取文本文件的前N行？使用的操作系统会对实施产生影响吗？

Answer 1

Python 2

with open("datafile") as myfile:
    head = [next(myfile) for x in xrange(N)]
print head

Run Code Online (Sandbox Code Playgroud)

Python 3

with open("datafile") as myfile:
    head = [next(myfile) for x in range(N)]
print(head)

Run Code Online (Sandbox Code Playgroud)

这是另一种方式(Python 2和3)

from itertools import islice
with open("datafile") as myfile:
    head = list(islice(myfile, N))
print head

Run Code Online (Sandbox Code Playgroud)

请记住,如果文件少于N行,则会引发必须处理的StopIteration异常 (17认同)

Answer 2

gho*_*g74 18

N = 10
file = open("file.txt", "a")#the a opens it in append mode
for i in range(N):
    line = file.next().strip()
    print line
file.close()

Run Code Online (Sandbox Code Playgroud)

每当我看到`f = open("file")`而没有异常处理来关闭文件时,我都会畏缩.处理文件的Pythonic方法是使用上下文管理器,即使用with语句.这在[输入输出Python教程](http://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects)中有所介绍.`"在处理文件对象时使用with关键字是一种好习惯.这样做的好处是文件在套件完成后正确关闭,即使在途中引发了异常." (19认同)
为什么要以附加模式打开文件？ (4认同)

Answer 3

G M*_*G M 14

如果要快速读取第一行并且不关心性能,可以使用.readlines()哪个返回列表对象,然后对列表进行切片.

例如前5行:

with open("pathofmyfileandfileandname") as myfile:
    firstNlines=myfile.readlines()[0:5] #put here the interval you want

Run Code Online (Sandbox Code Playgroud)

注意:整个文件是从性能的角度来看的,所以不是最好的,但它易于使用,编写速度快,易于记忆,所以如果你只想执行一些一次性计算是非常方便的

print firstNlines

Run Code Online (Sandbox Code Playgroud)

应该避免这种情况。 (5认同)
最佳答案可能更有效,但这个工作就像小文件的魅力. (2认同)
请注意，这实际上首先将整个文件读入一个列表（myfile.readlines()），然后拼接它的前 5 行。 (2认同)

Answer 4

Cro*_*non 8

我所做的就是使用N行pandas.我认为性能不是最好的,但例如N=1000:

import pandas as pd
yourfile = pd.read('path/to/your/file.csv',nrows=1000)

Run Code Online (Sandbox Code Playgroud)

最好使用 `nrows` 选项，该选项可以设置为 1000，并且不会加载整个文件。http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html 一般来说，pandas 有这种和其他大文件节省内存的技术。 (3认同)
@Cro-Magnon我在文档中找不到“pandas.read()”函数，您知道有关该主题的任何信息吗？ (2认同)

Answer 5

art*_*nil 6

没有特定的方法来读取文件对象公开的行数.

我想最简单的方法是:

lines =[]
with open(file_name) as f:
    lines.extend(f.readline() for i in xrange(N))

Run Code Online (Sandbox Code Playgroud)

Answer 6

Fat*_*ici 6

执行此操作的两种最直观的方法是：

逐行和行break后迭代文件N。
next()使用方法times . 逐行迭代文件N。（这本质上只是最佳答案的不同语法。）

这是代码：

# Method 1:
with open("fileName", "r") as f:
    counter = 0
    for line in f:
        print line
        counter += 1
        if counter == N: break

# Method 2:
with open("fileName", "r") as f:
    for i in xrange(N):
        line = f.next()
        print line

Run Code Online (Sandbox Code Playgroud)

最重要的是，只要您不使用整个文件readlines()或enumerate将整个文件放入内存中，您就有很多选择。

归档时间：	15 年，11 月前
查看次数：	215919 次
最近记录：	5 年，11 月前