Pandas - 读取 .csv 文件的结尾

Question

Pandas - 读取 .csv 文件的结尾

Fab*_*nna 6 python csv dataframe pandas

我有一个大 (8 GB) csv gzip 文件。我想通过 pandas 将其读入 DataFrame 中。由于文件的长度很大，所以我分块读取它并且工作正常，但我有兴趣知道是否有办法只读取最后 x 行，而不解压缩整个文件。

Answer 1

Gon*_*ica 2

我正在考虑读取数据帧最后几行的各种方法。由于我不确定我是否正确理解您所说的“不解压整个文件”的意思，我想知道您是否对以下选项感兴趣。

选项1

使用读取 .csv 文件时pandas.read_csv()，可以跳过行，因此它们不会包含在导入中。

为此，在调用它时应该传递skiprows=[x]，其中 x 是要排除的行号（请注意，行编号类似于列表，从 0 开始）。

选项2

另一种选择可能是将文件转换为 HDF5 并选择开始和停止。这是一个例子

import pandas as pd 
import numpy as np

df = pd.DataFrame({'Date' : np.random.randn(50000)},index=pd.date_range('20200528',periods=50000,freq='s'))

store = pd.HDFStore('example.h5', mode='w')

store.append('df', df)

rowsnumber = store.get_storer('df').nrows

store.select('df',start=nrows-5,stop=rowsnumber) #Change the start to the number of rows one wants to display starting from the end

Run Code Online (Sandbox Code Playgroud)

选项3

假设 df 已经与变量关联df，为了读取最后 5 行，请使用df.iloc

rows = df.iloc[-5:]

Run Code Online (Sandbox Code Playgroud)

或者df.tail

rows = df.tail(5)

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，10 月前
查看次数：	2047 次
最近记录：	3 年，3 月前