Iva*_*sky 10 python join dataframe pandas
您好,我正在尝试查找此错误的根本原因:
ValueError: You are trying to merge on object and int64 columns.
Run Code Online (Sandbox Code Playgroud)
我知道我可以使用 Pandasconcat或merge函数来解决这个问题,但我试图了解错误的原因。问题是:为什么我会得到这个ValueError?
这是使用的两个数据帧上的head(5)和的输出info()。
print(the_big_df.head(5)) 输出:
account apt apt_p balance date day flag month reps reqid year
0 AA0420 0 0.0 -578.30 2019-03-01 1 1 3 10 82f2d761 2019
1 AA0420 0 0.1 -578.30 2019-03-02 2 1 3 10 82f2d761 2019
2 AA0420 0 0.1 -578.30 2019-03-03 3 1 3 10 82f2d761 2019
3 AA0421 0 0.1 -607.30 2019-03-04 4 1 3 10 82f2d761 2019
4 AA0421 0 0.1 -610.21 2019-03-05 5 1 3 10 82f2d761 2019
Run Code Online (Sandbox Code Playgroud)
print(the_big_df.info()) 输出:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36054 entries, 0 to 36053
Data columns (total 11 columns):
account 36054 non-null object
apt 36054 non-null int64
apt_p 36054 non-null float64
balance 36054 non-null float64
date 36054 non-null datetime64[ns]
day 36054 non-null int64
flag 36054 non-null int64
month 36054 non-null int64
reps 36054 non-null int32
reqid 36054 non-null object
year 36054 non-null int64
dtypes: datetime64[ns](1), float64(2), int32(1), int64(5), object(2)
memory usage: 3.2+ MB
Run Code Online (Sandbox Code Playgroud)
这是我传递给的数据框join();print(df_to_join.head(5)):
reqid id
0 54580f39 13301
1 3ba905c0 77114
2 5f2d80da 13302
3 a1478e98 77115
4 9b09854b 78598
Run Code Online (Sandbox Code Playgroud)
print(df_to_join.info()) 输出:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14332 entries, 0 to 14331
Data columns (total 2 columns):
reqid 14332 non-null object
dni 14332 non-null object
Run Code Online (Sandbox Code Playgroud)
上述 4 次打印后的确切下一行是:
the_max_df = the_big_df.join(df_to_join,on='reqid')
Run Code Online (Sandbox Code Playgroud)
输出是,如上所述:
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
Run Code Online (Sandbox Code Playgroud)
为什么会发生这种情况,之前明确说明列reqid是两个数据框中的对象?谢谢。
Ste*_*tef 35
这里的问题是对 join 工作方式的误解:当您说the_big_df.join(df_to_join,on='reqid')这并不意味着join onthe_big_df.reqid == df_to_join.reqid正如人们乍一看时所假设的那样,而是 join on the_big_df.reqid == df_to_join.index。正如requid类型object和索引类型一样,int64您会收到错误。
请参阅以下文档join:
在索引或键列上将列与其他 DataFrame 连接。
...
on:str,str 列表,或类似数组的可选
列或索引级别名称在调用者中加入其他中的索引,否则加入 index-on-index。
看下面的例子:
df1 = pd.DataFrame({'id1': [1, 2], 'val1': [11,12]})
df2 = pd.DataFrame({'id2': [3, 4], 'val2': [21,22]})
print(df1)
# id1 val1
#0 1 11
#1 2 12
print(df2)
# id2 val2
#0 3 21
#1 4 22
# join on df1.id1 (int64) == df2.index (int64)
print(df1.join(df2, on='id1'))
# id1 val1 id2 val2
#0 1 11 4.0 22.0
#1 2 12 NaN NaN
# now df3 same as df1 but id3 as object:
df3 = pd.DataFrame({'id3': ['1', '2'], 'val1': [11,12]})
# try to join on df3.id3 (object) == df2.index (int64)
df3.join(df2, on='id3')
#ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
Run Code Online (Sandbox Code Playgroud)
>>> df3.join(df2, on='id3')
id3 val1 id2 val2
0 1 11 NaN NaN
1 2 12 NaN NaN
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
11447 次 |
| 最近记录: |