Gri*_*ees 13 python r hdf5 pandas data.table
我认为标题涵盖了这个问题,但要阐明:
的熊猫 Python包具有用于在python保持表数据的数据帧的数据类型.它还有一个方便的hdf5文件格式接口,所以pandas DataFrames(和其他数据)可以使用简单的类似dict的界面保存(假设你安装了pytables)
import pandas
import numpy
d = pandas.HDFStore('data.h5')
d['testdata'] = pandas.DataFrame({'N': numpy.random.randn(5)})
d.close()
Run Code Online (Sandbox Code Playgroud)
到现在为止还挺好.但是,如果我然后尝试将相同的hdf5加载到RI中,请看事情并非如此简单:
> library(hdf5)
> hdf5load('data.h5')
NULL
> testdata
$block0_values
[,1] [,2] [,3] [,4] [,5]
[1,] 1.498147 0.8843877 -1.081656 0.08717049 -1.302641
attr(,"CLASS")
[1] "ARRAY"
attr(,"VERSION")
[1] "2.3"
attr(,"TITLE")
[1] ""
attr(,"FLAVOR")
[1] "numpy"
$block0_items
[1] "N"
attr(,"CLASS")
[1] "ARRAY"
attr(,"VERSION")
[1] "2.3"
attr(,"TITLE")
[1] ""
attr(,"FLAVOR")
[1] "numpy"
attr(,"kind")
[1] "string"
attr(,"name")
[1] "N."
$axis1
[1] 0 1 2 3 4
attr(,"CLASS")
[1] "ARRAY"
attr(,"VERSION")
[1] "2.3"
attr(,"TITLE")
[1] ""
attr(,"FLAVOR")
[1] "numpy"
attr(,"kind")
[1] "integer"
attr(,"name")
[1] "N."
$axis0
[1] "N"
attr(,"CLASS")
[1] "ARRAY"
attr(,"VERSION")
[1] "2.3"
attr(,"TITLE")
[1] ""
attr(,"FLAVOR")
[1] "numpy"
attr(,"kind")
[1] "string"
attr(,"name")
[1] "N."
attr(,"TITLE")
[1] ""
attr(,"CLASS")
[1] "GROUP"
attr(,"VERSION")
[1] "1.0"
attr(,"ndim")
[1] 2
attr(,"axis0_variety")
[1] "regular"
attr(,"axis1_variety")
[1] "regular"
attr(,"nblocks")
[1] 1
attr(,"block0_items_variety")
[1] "regular"
attr(,"pandas_type")
[1] "frame"
Run Code Online (Sandbox Code Playgroud)
这让我想到了一个问题:理想情况下,我可以从R来回保存到熊猫.我显然可以写一个从熊猫到R的包装器(我想......虽然我认为如果我使用可能变得更加棘手的pandas MultiIndex),但我认为我不能轻易地将这些数据用在熊猫中.有什么建议?
额外奖励:我真正想要做的是使用带有pandas数据帧的R中的data.table包(两种包中的键控方法都非常相似).对那个人的任何帮助都非常感谢.
如果您仍在查看此内容,请查看Google论坛上的这篇文章.它显示了如何通过HDF5在pandas/R之间交换数据.
https://groups.google.com/forum/?fromgroups#!topic/pydata/0LR72GN9p6w
Pau*_*tra -1
您可以使用csv文件作为通用数据格式。R 和 python pandas 都可以轻松使用它。您可能会失去一些精度,但是这是否是一个问题取决于您的具体问题。
| 归档时间: |
|
| 查看次数: |
11722 次 |
| 最近记录: |