如何使用pandas 执行完全外连接 两个数据帧的交叉连接而没有共同的列?
在MySQL中,您可以简单地执行:
SELECT *
FROM table_1
[CROSS] JOIN table_2;
Run Code Online (Sandbox Code Playgroud)
但在熊猫中,做:
df_1.merge(df_2, how='outer')
Run Code Online (Sandbox Code Playgroud)
给出错误:
MergeError: No common columns to perform merge on
Run Code Online (Sandbox Code Playgroud)
我到目前为止最好的解决方案是使用sqlite:
import sqlalchemy as sa
engine = sa.create_engine('sqlite:///tmp.db')
df_1.to_sql('df_1', engine)
df_2.to_sql('df_2', engine)
df = pd.read_sql_query('SELECT * FROM df_1 JOIN df_2', engine)
jez*_*ael 11
IIUC你需要merge用临时列tmp两种DataFrames:
import pandas as pd
df1 = pd.DataFrame({'fld1': ['x', 'y'],
'fld2': ['a', 'b1']})
df2 = pd.DataFrame({'fld3': ['y', 'x', 'y'],
'fld4': ['a', 'b1', 'c2']})
print df1
fld1 fld2
0 x a
1 y b1
print df2
fld3 fld4
0 y a
1 x b1
2 y c2
df1['tmp'] = 1
df2['tmp'] = 1
df = pd.merge(df1, df2, on=['tmp'])
df = df.drop('tmp', axis=1)
print df
fld1 fld2 fld3 fld4
0 x a y a
1 x a x b1
2 x a y c2
3 y b1 y a
4 y b1 x b1
5 y b1 y c2
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
10572 次 |
| 最近记录: |