我试图从功能选择挑战中分析Gizette数据集
当我尝试用基于熊猫示例的标签系列连接火车数据帧时
投
ValueError:缓冲区dtype不匹配,预期'Python对象'但是'long long'
码:
import pandas as pd
trainData = pd.read_table(filepath_or_buffer='GISETTE/gisette_train.data'
,delim_whitespace=True
,header=None
,names=['AA','AB','AC','AD','AE','AF','AG','AH','AI','AJ','AK','AL','AM','AN','AO','AP','AQ','AR','AS','AT','AU','AV','AW','AX','AY','AZ','BA','BB','BC','BD','BE','BF','BG','BH','BI','BJ','BK','BL','BM','BN','BO','BP','BQ','BR','BS','BT','BU','BV','BW','BX','BY','BZ','CA','CB','CC','CD','CE','CF','CG','CH','CI','CJ','CK','CL','CM','CN','CO','CP','CQ','CR','CS','CT','CU','CV','CW','CX','CY','CZ','DA','DB','DC','DD','DE','DF','DG','DH','DI','DJ','DK','DL','DM','DN','DO','DP','DQ','DR','DS','DT','DU','DV','DW','DX','DY','DZ','EA','EB','EC','ED','EE','EF','EG','EH','EI','EJ','EK','EL','EM','EN','EO','EP','EQ','ER','ES','ET','EU','EV','EW','EX','EY','EZ','FA','FB','FC','FD','FE','FF','FG','FH','FI','FJ','FK','FL','FM','FN','FO','FP','FQ','FR','FS','FT','FU','FV','FW','FX','FY','FZ','GA','GB','GC','GD','GE','GF','GG','GH','GI','GJ','GK','GL','GM','GN','GO','GP','GQ','GR','GS','GT','GU','GV','GW','GX','GY','GZ','HA','HB','HC','HD','HE','HF','HG','HH','HI','HJ','HK','HL','HM','HN','HO','HP','HQ','HR','HS','HT','HU','HV','HW','HX','HY','HZ','IA','IB','IC','ID','IE','IF','IG','IH','II','IJ','IK','IL','IM','IN','IO','IP','IQ','IR','IS','IT','IU','IV','IW','IX','IY','IZ','JA','JB','JC','JD','JE','JF','JG','JH','JI','JJ','JK','JL','JM','JN','JO','JP','JQ','JR','JS','JT','JU','JV','JW','JX','JY','JZ','KA','KB','KC','KD','KE','KF','KG','KH','KI','KJ','KK','KL','KM','KN','KO','KP','KQ','KR','KS','KT','KU','KV','KW','KX','KY','KZ','LA','LB','LC','LD','LE','LF','LG','LH','LI','LJ','LK','LL','LM','LN','LO','LP','LQ','LR','LS','LT','LU','LV','LW','LX','LY','LZ','MA','MB','MC','MD','ME','MF','MG','MH','MI','MJ','MK','ML','MM','MN','MO','MP','MQ','MR','MS','MT','MU','MV','MW','MX','MY','MZ','NA','NB','NC','ND','NE','NF','NG','NH','NI','NJ','NK','NL','NM','NN','NO','NP','NQ','NR','NS','NT','NU','NV','NW','NX','NY','NZ','OA','OB','OC','OD','OE','OF','OG','OH','OI','OJ','OK','OL','OM','ON','OO','OP','OQ','OR','OS','OT','OU','OV','OW','OX','OY','OZ','PA','PB','PC','PD','PE','PF','PG','PH','PI','PJ','PK','PL','PM','PN','PO','PP','PQ','PR','PS','PT','PU','PV','PW','PX','PY','PZ','QA','QB','QC','QD','QE','QF','QG','QH','QI','QJ','QK','QL','QM','QN','QO','QP','QQ','QR','QS','QT','QU','QV','QW','QX','QY','QZ','RA','RB','RC','RD','RE','RF','RG','RH','RI','RJ','RK','RL','RM','RN','RO','RP','RQ','RR','RS','RT','RU','RV','RW','RX','RY','RZ','SA','SB','SC','SD','SE','SF','SG','SH','SI','SJ','SK','SL','SM','SN','SO','SP','SQ','SR','SS','ST','SU','SV','SW','SX','SY','SZ','TA','TB','TC','TD','TE','TF'])
# print 'finished with train data'
trainLabel = pd.read_table(filepath_or_buffer='GISETTE/gisette_train.labels'
,squeeze=True
,names=['label']
,delim_whitespace=True
,header=None)
trainData.info()
Run Code Online (Sandbox Code Playgroud)
输出
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 6000 entries
Columns: 500 entries, AA to TF
dtypes: int64(500)None
trainLabel.describe()
Run Code Online (Sandbox Code Playgroud)
输出
count 6000.000000
mean 0.000000
std 1.000083
min -1.000000
25% -1.000000
50% 0.000000
75% 1.000000
max 1.000000
dtype: float64
readyToTrain = pd.concat([trainData, trainLabel], axis=1)
Run Code Online (Sandbox Code Playgroud)
完整的堆栈跟踪
File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 717, in concat
verify_integrity=verify_integrity)
File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 848, in __init__
self.new_axes = self._get_new_axes()
File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 898, in _get_new_axes
new_axes[i] = self._get_comb_axis(i)
File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 924, in _get_comb_axis
return _get_combined_index(all_indexes, intersect=self.intersect)
File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 3991, in _get_combined_index
union = _union_indexes(indexes)
File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 4017, in _union_indexes
result = result.union(other)
File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 3753, in union
uniq_tuples = lib.fast_unique_multiple([self.values, other.values])
File "lib.pyx", line 366, in pandas.lib.fast_unique_multiple (pandas\lib.c:8378)
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'
Run Code Online (Sandbox Code Playgroud)
编辑:来自lfd.uci.edu/~gohlke/pythonlibs pandas-0.14.1.win-amd64-py2.7的二进制安装库
尝试过将系列转换为帧的建议(没有与上面相同的堆栈跟踪)帧信息:
数据帧信息(trainData)
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 6000 entries, (550, 0, 495, 0, 0, 0, 0, 976, 0, 0, 0, 0, 983, 0, 995, 0, 983, 0, 0, 983, 0, 0, 0, 0, 0, 983, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 991, 983, 0, 0, 0, 0, 0, 0, 0, 0, 0, 808, 0, 778, 0, 983, 0, 0, 0, 0, 991, 0, 0, 0, 0, 0, 0, 0, 991, 983, 983, 0, 0, 0, 0, 0, 0, 0, 983, 735, 0, 0, 983, 983, 0, 0, 0, 0, 569, 0, 0, 0, 0, 713, 0, 0, 0, 0, 0, 983, 983, 0, ...) to (0, 0, 991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 948, 995, 348, 0, 0, 0, 0, 0, 0, 0, 0, 0, 751, 0, 0, 0, 0, 0, 0, 0, 0, 804, 0, 0, 0, 862, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 991, 0, 0, 0, 0, 995, 0, 0, 0, 0, 0, 0, 840, 0, 0, 0, 976, 0, 0, 0, 0, 0, 0, 777, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...)
Columns: 500 entries, AA to TF
dtypes: int64(500)None
Run Code Online (Sandbox Code Playgroud)
系列到数据帧信息(trainLabel):
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6000 entries, 0 to 5999
Data columns (total 1 columns):
label 6000 non-null int64
dtypes: int64(1)None
Run Code Online (Sandbox Code Playgroud)
就像乔里斯指出的那样(就像我必须弄清楚自己,因为我没有先阅读评论),问题是你的索引。
更改您的代码
pd.concat(to_concat, axis=1)
Run Code Online (Sandbox Code Playgroud)
到
pd.concat([s.reset_index(drop=True) for s in to_concat], axis=1)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
11217 次 |
| 最近记录: |