如何连接两个具有重复列名的数据框?

you*_*yun 2 python dataframe pandas

可重复的数据

import pandas as pd
import numpy as np

cols1=['b','a','c','a']
data1=[0,0,0,0]
df1=pd.DataFrame([data1], columns= cols1)
df1

cols2=['b','a', 'd', 'a', 'e','f']
data2=[1,1,1,1,1,1]
df2=pd.DataFrame([data2], columns= cols2)
df2

Run Code Online (Sandbox Code Playgroud)

我想要的结果

data = { "b": [0, 1],
        "b a" : [0, 1],
        "c" : [0, np.NaN],
        "c a" : [0, np.NaN],
       "d" : [np.NaN, 1],
       "d a" : [np.NaN, 1],
       "e" : [np.NaN, 1],
       "f" : [np.NaN, 1]}
pd.DataFrame(data)
Run Code Online (Sandbox Code Playgroud)

如果 df 有重复的列名“a”,我无法使用“concat”函数。

有什么好的方法来处理重复的列名吗?

如果“a”之前有“b”,我想将对应的a更改为“b a”。

jez*_*ael 5

它不是标准的重复数据删除列名称,如果不是连续重复的列名称a,b,b,bb,a,b,a 列名称,则此名称有效:

s1 = df1.columns.to_series()
df1.columns = [f'{b} {a}' if c else a for a, b, c in zip(df1.columns, s1.shift(fill_value=''), s1.duplicated(keep=False))]


s2 = df2.columns.to_series()
df2.columns = [f'{b} {a}' if c else a for a, b, c in zip(df2.columns, s2.shift(fill_value=''), s2.duplicated(keep=False))]


df = pd.concat([df1, df2])
print (df)
   b  b a    c  c a    d  d a    e    f
0  0    0  0.0  0.0  NaN  NaN  NaN  NaN
0  1    1  NaN  NaN  1.0  1.0  1.0  1.0
Run Code Online (Sandbox Code Playgroud)

显然去重列名是通过位置来完成的,这里是a重复的,所以添加了1。但输出不同:

s1 = df1.columns.to_series()
df1.columns = s1.str.cat(s1.groupby(s1).cumcount().astype(str), sep=' ').str.replace(' 0','', regex=True)

s2 = df2.columns.to_series()
df2.columns = s2.str.cat(s2.groupby(s2).cumcount().astype(str), sep=' ').str.replace(' 0','', regex=True)


df = pd.concat([df1, df2], ignore_index=True)
print (df)
   b  a    c  a 1    d    e    f
0  0  0  0.0    0  NaN  NaN  NaN
1  1  1  NaN    1  1.0  1.0  1.0
Run Code Online (Sandbox Code Playgroud)