我使用python pandas读取一些大的CSV文件并将其存储在HDF5文件中,生成的HDF5文件大约为10GB. 阅读时会出现问题.即使我试图以块的形式读回来,我仍然得到MemoryError.
import glob, os
import pandas as pd
hdf = pd.HDFStore('raw_sample_storage2.h5')
os.chdir("C:/RawDataCollection/raw_samples/PLB_Gate")
for filename in glob.glob("RD_*.txt"):
raw_df = pd.read_csv(filename,
sep=' ',
header=None,
names=['time', 'GW_time', 'node_id', 'X', 'Y', 'Z', 'status', 'seq', 'rssi', 'lqi'],
dtype={'GW_time': uint32, 'node_id': uint8, 'X': uint16, 'Y': uint16, 'Z':uint16, 'status': uint8, 'seq': uint8, 'rssi': int8, 'lqi': uint8},
parse_dates=['time'],
date_parser=dateparse,
chunksize=50000,
skip_blank_lines=True)
for chunk in raw_df:
hdf.append('raw_sample_all', chunk, format='table', data_columns = True, index = True, compression='blosc', complevel=9)
Run Code Online (Sandbox Code Playgroud)
for df in pd.read_hdf('raw_sample_storage2.h5','raw_sample_all', chunksize=300000):
print(df.head(1))
Run Code Online (Sandbox Code Playgroud)
我用anaconda,我无法升级
conda update pytables
Run Code Online (Sandbox Code Playgroud)
它说"已经安装".
....
# All requested packages already installed.
# packages in environment at C:\Anaconda:
#
pytables 3.1.1 np19py27_1
Run Code Online (Sandbox Code Playgroud)
然后我尝试了点子:
C:\Users\HP>pip install --upgrade tables
Collecting tables
Using cached tables-3.2.0.tar.gz
Complete output from command python setup.py egg_info:
H5closecfvx_f.c
r:\temp\H5closecfvx_f.c(2) : warning C4013: 'H5close' undefined; assuming ex
tern returning int
LINK : fatal error LNK1181: cannot open input file 'hdf5dll.lib'
* Using Python 2.7.3 |Anaconda 2.2.0 (32-bit)| (default, Feb 25 2013, 18:26:
30) [MSC v.1500 32 bit (Intel)]
* …Run Code Online (Sandbox Code Playgroud)