我有以下两个数据帧,我已将日期设置为DateTime Index,df.set_index(pd.to_datetime(df['date']), inplace=True)并希望在日期合并或加入:
df.head(5)
catcode_amt type feccandid_amt amount
date
1915-12-31 A5000 24K H6TX08100 1000
1916-12-31 T6100 24K H8CA52052 500
1954-12-31 H3100 24K S8AK00090 1000
1985-12-31 J7120 24E H8OH18088 36
1997-12-31 z9600 24K S6ND00058 2000
d.head(5)
catcode_disp disposition feccandid_disp bills
date
2007-12-31 A0000 support S4HI00011 1
2007-12-31 A1000 oppose S4IA00020', 'P20000741 1
2007-12-31 A1000 support S8MT00010 1
2007-12-31 A1500 support S6WI00061 2
2007-12-31 A1600 support S4IA00020', 'P20000741 3
Run Code Online (Sandbox Code Playgroud)
我尝试了以下两种方法但都返回了一个MemoryError:
df.join(d, how='right')
Run Code Online (Sandbox Code Playgroud)
我在没有将日期设置为索引的数据帧上使用下面的代码.
merge=pd.merge(df,d, how='inner', on='date')
Run Code Online (Sandbox Code Playgroud)
jez*_*ael 13
您可以添加参数left_index=True,right_index=True如果需要在函数中按索引合并merge:
merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
Run Code Online (Sandbox Code Playgroud)
示例(索引的第一个值d已更改以进行匹配):
print df
catcode_amt type feccandid_amt amount
date
1915-12-31 A5000 24K H6TX08100 1000
1916-12-31 T6100 24K H8CA52052 500
1954-12-31 H3100 24K S8AK00090 1000
1985-12-31 J7120 24E H8OH18088 36
1997-12-31 z9600 24K S6ND00058 2000
print d
catcode_disp disposition feccandid_disp bills
date
1997-12-31 A0000 support S4HI00011 1.0
2007-12-31 A1000 oppose S4IA00020', 'P20000741 1 NaN
2007-12-31 A1000 support S8MT00010 1.0
2007-12-31 A1500 support S6WI00061 2.0
2007-12-31 A1600 support S4IA00020', 'P20000741 3 NaN
merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
print merge
catcode_amt type feccandid_amt amount catcode_disp disposition \
date
1997-12-31 z9600 24K S6ND00058 2000 A0000 support
feccandid_disp bills
date
1997-12-31 S4HI00011 1.0
Run Code Online (Sandbox Code Playgroud)
或者您可以使用concat:
print pd.concat([df,d], join='inner', axis=1)
date
1997-12-31 z9600 24K S6ND00058 2000 A0000 support
feccandid_disp bills
date
1997-12-31 S4HI00011 1.0
Run Code Online (Sandbox Code Playgroud)
编辑:EdChum是对的:
我向DataFrame添加重复项df(索引中的最后2个值):
print df
catcode_amt type feccandid_amt amount
date
1915-12-31 A5000 24K H6TX08100 1000
1916-12-31 T6100 24K H8CA52052 500
1954-12-31 H3100 24K S8AK00090 1000
2007-12-31 J7120 24E H8OH18088 36
2007-12-31 z9600 24K S6ND00058 2000
print d
catcode_disp disposition feccandid_disp bills
date
1997-12-31 A0000 support S4HI00011 1.0
2007-12-31 A1000 oppose S4IA00020', 'P20000741 1 NaN
2007-12-31 A1000 support S8MT00010 1.0
2007-12-31 A1500 support S6WI00061 2.0
2007-12-31 A1600 support S4IA00020', 'P20000741 3 NaN
merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
Run Code Online (Sandbox Code Playgroud)
print merge
catcode_amt type feccandid_amt amount catcode_disp disposition \
date
2007-12-31 J7120 24E H8OH18088 36 A1000 oppose
2007-12-31 J7120 24E H8OH18088 36 A1000 support
2007-12-31 J7120 24E H8OH18088 36 A1500 support
2007-12-31 J7120 24E H8OH18088 36 A1600 support
2007-12-31 z9600 24K S6ND00058 2000 A1000 oppose
2007-12-31 z9600 24K S6ND00058 2000 A1000 support
2007-12-31 z9600 24K S6ND00058 2000 A1500 support
2007-12-31 z9600 24K S6ND00058 2000 A1600 support
feccandid_disp bills
date
2007-12-31 S4IA00020', 'P20000741 1 NaN
2007-12-31 S8MT00010 1.0
2007-12-31 S6WI00061 2.0
2007-12-31 S4IA00020', 'P20000741 3 NaN
2007-12-31 S4IA00020', 'P20000741 1 NaN
2007-12-31 S8MT00010 1.0
2007-12-31 S6WI00061 2.0
2007-12-31 S4IA00020', 'P20000741 3 NaN
Run Code Online (Sandbox Code Playgroud)
看起来您的日期是您的索引,在这种情况下,您希望合并索引而不是列。如果您有两个数据框,df_1并且df_2:
df_1.merge(df_2, left_index=True, right_index=True, how='inner')
| 归档时间: |
|
| 查看次数: |
33042 次 |
| 最近记录: |