Pandas concat ValueError:缓冲区dtype不匹配,预期'Python对象'但得到'long long'

lap*_*nio 8 python-2.7 pandas

我试图从功能选择挑战中分析Gizette数据集

当我尝试用基于熊猫示例的标签系列连接火车数据帧时

ValueError:缓冲区dtype不匹配,预期'Python对象'但是'long long'

码:

import pandas as pd

trainData = pd.read_table(filepath_or_buffer='GISETTE/gisette_train.data'
                              ,delim_whitespace=True
                              ,header=None
                              ,names=['AA','AB','AC','AD','AE','AF','AG','AH','AI','AJ','AK','AL','AM','AN','AO','AP','AQ','AR','AS','AT','AU','AV','AW','AX','AY','AZ','BA','BB','BC','BD','BE','BF','BG','BH','BI','BJ','BK','BL','BM','BN','BO','BP','BQ','BR','BS','BT','BU','BV','BW','BX','BY','BZ','CA','CB','CC','CD','CE','CF','CG','CH','CI','CJ','CK','CL','CM','CN','CO','CP','CQ','CR','CS','CT','CU','CV','CW','CX','CY','CZ','DA','DB','DC','DD','DE','DF','DG','DH','DI','DJ','DK','DL','DM','DN','DO','DP','DQ','DR','DS','DT','DU','DV','DW','DX','DY','DZ','EA','EB','EC','ED','EE','EF','EG','EH','EI','EJ','EK','EL','EM','EN','EO','EP','EQ','ER','ES','ET','EU','EV','EW','EX','EY','EZ','FA','FB','FC','FD','FE','FF','FG','FH','FI','FJ','FK','FL','FM','FN','FO','FP','FQ','FR','FS','FT','FU','FV','FW','FX','FY','FZ','GA','GB','GC','GD','GE','GF','GG','GH','GI','GJ','GK','GL','GM','GN','GO','GP','GQ','GR','GS','GT','GU','GV','GW','GX','GY','GZ','HA','HB','HC','HD','HE','HF','HG','HH','HI','HJ','HK','HL','HM','HN','HO','HP','HQ','HR','HS','HT','HU','HV','HW','HX','HY','HZ','IA','IB','IC','ID','IE','IF','IG','IH','II','IJ','IK','IL','IM','IN','IO','IP','IQ','IR','IS','IT','IU','IV','IW','IX','IY','IZ','JA','JB','JC','JD','JE','JF','JG','JH','JI','JJ','JK','JL','JM','JN','JO','JP','JQ','JR','JS','JT','JU','JV','JW','JX','JY','JZ','KA','KB','KC','KD','KE','KF','KG','KH','KI','KJ','KK','KL','KM','KN','KO','KP','KQ','KR','KS','KT','KU','KV','KW','KX','KY','KZ','LA','LB','LC','LD','LE','LF','LG','LH','LI','LJ','LK','LL','LM','LN','LO','LP','LQ','LR','LS','LT','LU','LV','LW','LX','LY','LZ','MA','MB','MC','MD','ME','MF','MG','MH','MI','MJ','MK','ML','MM','MN','MO','MP','MQ','MR','MS','MT','MU','MV','MW','MX','MY','MZ','NA','NB','NC','ND','NE','NF','NG','NH','NI','NJ','NK','NL','NM','NN','NO','NP','NQ','NR','NS','NT','NU','NV','NW','NX','NY','NZ','OA','OB','OC','OD','OE','OF','OG','OH','OI','OJ','OK','OL','OM','ON','OO','OP','OQ','OR','OS','OT','OU','OV','OW','OX','OY','OZ','PA','PB','PC','PD','PE','PF','PG','PH','PI','PJ','PK','PL','PM','PN','PO','PP','PQ','PR','PS','PT','PU','PV','PW','PX','PY','PZ','QA','QB','QC','QD','QE','QF','QG','QH','QI','QJ','QK','QL','QM','QN','QO','QP','QQ','QR','QS','QT','QU','QV','QW','QX','QY','QZ','RA','RB','RC','RD','RE','RF','RG','RH','RI','RJ','RK','RL','RM','RN','RO','RP','RQ','RR','RS','RT','RU','RV','RW','RX','RY','RZ','SA','SB','SC','SD','SE','SF','SG','SH','SI','SJ','SK','SL','SM','SN','SO','SP','SQ','SR','SS','ST','SU','SV','SW','SX','SY','SZ','TA','TB','TC','TD','TE','TF'])
# print 'finished with train data'
trainLabel = pd.read_table(filepath_or_buffer='GISETTE/gisette_train.labels'
                           ,squeeze=True
                           ,names=['label']
                           ,delim_whitespace=True
                           ,header=None)
trainData.info()
Run Code Online (Sandbox Code Playgroud)

输出

    <class 'pandas.core.frame.DataFrame'>
    MultiIndex: 6000 entries   
    Columns: 500 entries, AA to TF   
    dtypes: int64(500)None



trainLabel.describe()
Run Code Online (Sandbox Code Playgroud)

输出

    count    6000.000000
    mean        0.000000
    std         1.000083
    min        -1.000000
    25%        -1.000000
    50%         0.000000
    75%         1.000000
    max         1.000000
    dtype: float64

readyToTrain = pd.concat([trainData, trainLabel], axis=1)
Run Code Online (Sandbox Code Playgroud)

完整的堆栈跟踪

   File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 717, in concat  
     verify_integrity=verify_integrity)  
   File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 848, in __init__  
     self.new_axes = self._get_new_axes()  
   File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 898, in _get_new_axes  
     new_axes[i] = self._get_comb_axis(i)  
   File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 924, in _get_comb_axis  
     return _get_combined_index(all_indexes, intersect=self.intersect)  
   File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 3991, in _get_combined_index  
     union = _union_indexes(indexes)  
   File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 4017, in _union_indexes  
     result = result.union(other)  
   File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 3753, in union  
     uniq_tuples = lib.fast_unique_multiple([self.values, other.values])  
   File "lib.pyx", line 366, in pandas.lib.fast_unique_multiple (pandas\lib.c:8378)  
     ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'
Run Code Online (Sandbox Code Playgroud)

编辑:来自lfd.uci.edu/~gohlke/pythonlibs pandas-0.14.1.win-amd64-py2.7的二进制安装库

尝试过将系列转换为帧的建议(没有与上面相同的堆栈跟踪)帧信息:

数据帧信息(trainData)

    <class 'pandas.core.frame.DataFrame'>
    MultiIndex: 6000 entries, (550, 0, 495, 0, 0, 0, 0, 976, 0, 0, 0, 0, 983, 0, 995, 0, 983, 0, 0, 983, 0, 0, 0, 0, 0, 983, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 991, 983, 0, 0, 0, 0, 0, 0, 0, 0, 0, 808, 0, 778, 0, 983, 0, 0, 0, 0, 991, 0, 0, 0, 0, 0, 0, 0, 991, 983, 983, 0, 0, 0, 0, 0, 0, 0, 983, 735, 0, 0, 983, 983, 0, 0, 0, 0, 569, 0, 0, 0, 0, 713, 0, 0, 0, 0, 0, 983, 983, 0, ...) to (0, 0, 991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 948, 995, 348, 0, 0, 0, 0, 0, 0, 0, 0, 0, 751, 0, 0, 0, 0, 0, 0, 0, 0, 804, 0, 0, 0, 862, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 991, 0, 0, 0, 0, 995, 0, 0, 0, 0, 0, 0, 840, 0, 0, 0, 976, 0, 0, 0, 0, 0, 0, 777, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...)
    Columns: 500 entries, AA to TF
    dtypes: int64(500)None
Run Code Online (Sandbox Code Playgroud)

系列到数据帧信息(trainLabel):

    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 6000 entries, 0 to 5999
    Data columns (total 1 columns):
    label    6000 non-null int64
    dtypes: int64(1)None
Run Code Online (Sandbox Code Playgroud)

The*_*Cat 3

就像乔里斯指出的那样(就像我必须弄清楚自己,因为我没有先阅读评论),问题是你的索引。

更改您的代码

pd.concat(to_concat, axis=1)
Run Code Online (Sandbox Code Playgroud)

pd.concat([s.reset_index(drop=True) for s in to_concat], axis=1)
Run Code Online (Sandbox Code Playgroud)