Pytables表变成了pandas DataFrame

Jim*_*oll 6 pytables pandas

关于如何将csv读入pandas数据帧的大量信息,但我所拥有的是一个pyTable表并且想要一个pandas DataFrame.

我已经找到了如何将我的pandas DataFrame存储 pytables ...然后读取我想要读回来,此时它将具有:

"kind = v._v_attrs.pandas_type"  
Run Code Online (Sandbox Code Playgroud)

我可以把它写成csv并重新阅读,但这看起来很傻.这就是我现在正在做的事情.

我应该如何将pytable对象读入熊猫?

met*_*ore 7

import tables as pt
import pandas as pd
import numpy as np

# the content is junk but we don't care
grades = np.empty((10,2), dtype=(('name', 'S20'), ('grade', 'u2')))

# write to a PyTables table
handle = pt.openFile('/tmp/test_pandas.h5', 'w')
handle.createTable('/', 'grades', grades)
print handle.root.grades[:].dtype # it is a structured array

# load back as a DataFrame and check types
df = pd.DataFrame.from_records(handle.root.grades[:])
df.dtypes
Run Code Online (Sandbox Code Playgroud)

要注意的是你的U2(无符号2字节整数)将结束为6-18(整数8字节),和琴弦就会对象,因为大熊猫还不支持全系列可用于numpy的阵列dtypes的.


And*_*den 5

文档现在包含一个关于使用HDF5商店的优秀部分,并且在食谱中讨论了一些更高级的策略.

它现在相对简单:

In [1]: store = HDFStore('store.h5')

In [2]: print store
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
Empty

In [3]: df = DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [4]: store['df'] = df

In [5]: store
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df            frame        (shape->[2,2])
Run Code Online (Sandbox Code Playgroud)

并从HDF5/pytables检索:

In [6]: store['df']  # store.get('df') is an equivalent
Out[6]:
   A  B
0  1  2
1  3  4
Run Code Online (Sandbox Code Playgroud)

您还可以在表格中查询.