dur*_*2.0 40 python numpy pandas
我试图用pandas read_csv方法读取一个简单的空格分隔文件.但是,大熊猫似乎并没有遵守我的dtype观点.也许我错误地指定了它?
read_csv对于这个简单的测试用例,我已经提炼了一些复杂的调用.我实际上converters在我的"真实"场景中使用了这个参数,但为了简单起见我删除了它.
以下是我的ipython会话:
>>> cat test.out
a b
0.76398 0.81394
0.32136 0.91063
>>> import pandas
>>> import numpy
>>> x = pandas.read_csv('test.out', dtype={'a': numpy.float32}, delim_whitespace=True)
>>> x
a b
0 0.76398 0.81394
1 0.32136 0.91063
>>> x.a.dtype
dtype('float64')
Run Code Online (Sandbox Code Playgroud)
我也尝试过这种使用这种具有dtype的numpy.int32或numpy.int64.这些选择导致异常:
AttributeError: 'NoneType' object has no attribute 'dtype'
Run Code Online (Sandbox Code Playgroud)
我假设AttributeError是因为pandas不会自动尝试将浮点值转换/截断为整数?
我正在使用32位版本的Python运行32位机器.
>>> !uname -a
Linux ubuntu 3.0.0-13-generic #22-Ubuntu SMP Wed Nov 2 13:25:36 UTC 2011 i686 i686 i386 GNU/Linux
>>> import platform
>>> platform.architecture()
('32bit', 'ELF')
>>> pandas.__version__
'0.10.1'
Run Code Online (Sandbox Code Playgroud)
Jef*_*eff 26
0.10.1并不真正支持float32
请参阅此http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#dtype-specification
你可以在0.11这样做:
# dont' use dtype converters explicity for the columns you care about
# they will be converted to float64 if possible, or object if they cannot
df = pd.read_csv('test.csv'.....)
#### this is optional and related to the issue you posted ####
# force anything that is not a numeric to nan
# columns are the list of columns that you are interesetd in
df[columns] = df[columns].convert_objects(convert_numeric=True)
# astype
df[columns] = df[columns].astype('float32')
see http://pandas.pydata.org/pandas-docs/dev/basics.html#object-conversion
Its not as efficient as doing it directly in read_csv (but that requires
some low-level changes)
Run Code Online (Sandbox Code Playgroud)
我已经确认使用0.11-dev,这个DOES工作(在32位和64位上,结果是相同的)
In [5]: x = pd.read_csv(StringIO.StringIO(data), dtype={'a': np.float32}, delim_whitespace=True)
In [6]: x
Out[6]:
a b
0 0.76398 0.81394
1 0.32136 0.91063
In [7]: x.dtypes
Out[7]:
a float32
b float64
dtype: object
In [8]: pd.__version__
Out[8]: '0.11.0.dev-385ff82'
In [9]: quit()
vagrant@precise32:~/pandas$ uname -a
Linux precise32 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686 i686 i386 GNU/Linux
Run Code Online (Sandbox Code Playgroud)
小智 7
In [22]: df.a.dtype = pd.np.float32
In [23]: df.a.dtype
Out[23]: dtype('float32')
Run Code Online (Sandbox Code Playgroud)
在熊猫0.10.1下,上述工作对我来说很好
| 归档时间: |
|
| 查看次数: |
88815 次 |
| 最近记录: |