J.L*_*.L. 8 python numpy dataframe pandas
我是Python和编码的初学者.我需要帮助比较两个不同长度和不同列标签的数据帧,除了一个.两个数据集之间相同的列是我想要比较数据帧的列.我的数据如下:
df: 'fruits' 'trees' 'sports' 'countries'
bananas mongolia basketball Spain
grapes Oak rugby Thailand
oranges Osage Orange baseball Egypt
apples Maple golf Chile
df2: 'cars' 'flowers' 'countries' 'vegetables'
Audi Rose Spain Carrots
BMW Tulip Nigeria Celery
Honda Dandelion Egypt Onion
Run Code Online (Sandbox Code Playgroud)
我想根据列'countries'比较这两个数据帧,并在它们自己的数据帧中创建三个独立的输出.我一直在使用Pandas,并使用pd.concat将df1和df2合并为一个.我还想保留数据帧其余部分的行,即使它们不匹配.
这是我想要的输出:
输出#1:df中的值不在df2中:
d3: 'fruits' 'trees' 'sports' 'countries'
grapes Oak rugby Thailand
apples Maple golf Chile
Run Code Online (Sandbox Code Playgroud)
输出#2:df2中的值不是df中的值
df4: 'cars' 'flowers' 'countries' 'vegetables'
BMW Tulip Nigeria Celery
Run Code Online (Sandbox Code Playgroud)
输出#3:df和df2中的值(组合了不同数据帧的列.)
df5: 'fruits' 'trees' 'sports' 'cars' 'flowers' 'countries' 'vegetables'
bananas mongolia basketball Audi Rose Spain Carrots
Oranges Osage Orange baseball Honda Dandelion Egypt Onion
Run Code Online (Sandbox Code Playgroud)
希望这一切都有意义.我已经尝试了很多不同的东西(isin,DataFrame.diff和.difference,df-df2,numpy数组等)我已经看了一遍,我找不到我正在寻找的东西.任何帮助将不胜感激!谢谢!
设置参考
from StringIO import StringIO
import pandas as pd
txt1 = """fruits,trees,sports,countries
bananas,mongolia,basketball,Spain
grapes,Oak,rugby,Thailand
oranges,Osage,Orange baseball,Egypt
apples,Maple,golf,Chile"""
txt2 = """cars,flowers,countries,vegetables
Audi,Rose,Spain,Carrots
BMW,Tulip,Nigeria,Celery
Honda,Dandelion,Egypt,Onion"""
df = pd.read_csv(StringIO(txt1))
df2 = pd.read_csv(StringIO(txt2))
Run Code Online (Sandbox Code Playgroud)
def outer_parts(df1, df2):
df3 = df1.merge(df2, indicator=True, how='outer')
return {n: g.drop('_merge', 1) for n, g in df3.groupby('_merge')}
dfs = outer_parts(df, df2)
Run Code Online (Sandbox Code Playgroud)
dfs['both']
Run Code Online (Sandbox Code Playgroud)
dfs['left_only']
Run Code Online (Sandbox Code Playgroud)
dfs['right_only']
Run Code Online (Sandbox Code Playgroud)