当index相同时,DataFrame会连接不同的列值

Ale*_*ang 3 python group-by dataframe pandas pandas-groupby

我正在使用Python将多个DataFrames(DF)连接成一个DF,之后连接一些样本DF,如下所示:

import pandas as pd

df_list = []

df_0 = pd.DataFrame('1.11', index=['SS_0'], columns=['Tx-UDP'])
df_1 = pd.DataFrame('2.22', index=['SS_1'], columns=['Tx-UDP'])
df_2 = pd.DataFrame('3.33', index=['SS_1'], columns=['Tx-TCP'])

df_list.append(df_0)
df_list.append(df_1)
df_list.append(df_2)

df_final = pd.concat(df_list) # type: pd.DataFrame

print(df_final)
Run Code Online (Sandbox Code Playgroud)

我得到的结果打印为:

     Tx-TCP Tx-UDP
SS_0    NaN   1.11
SS_1    NaN   2.22
SS_1   3.33    NaN
Run Code Online (Sandbox Code Playgroud)

但我真正想要的结果就像下面的格式,它基于索引内容,如果索引内容相同,则值将被放入每列下的同一行,而不是开始一个新行并用NaN填写,即索引'SS_1'在示例中.如果索引是唯一的并且在某些列下没有数据呈现,那么用'NaN'填写就好了,即索引'SS_0'/ col'Tx-TCP'.

     Tx-TCP Tx-UDP
SS_0    NaN   1.11
SS_1   3.33   2.22
Run Code Online (Sandbox Code Playgroud)

试过concat/merge/join/grouby等,但还没找到办法做到这一点.请提前告知并提前多多建议!

piR*_*red 7

选项1
您希望pd.DataFrame.combine_first使用reducefrom 迭代地应用dataframe方法functools

from functools import reduce

reduce(pd.DataFrame.combine_first, df_list)

      Tx-TCP Tx-UDP
SS_0     NaN   1.11
SS_1    3.33   2.22
Run Code Online (Sandbox Code Playgroud)

选项2
我的pd.concat解决方案 版本

pd.concat(df_list).groupby(level=0).first()

     Tx-TCP Tx-UDP
SS_0    NaN   1.11
SS_1   3.33   2.22
Run Code Online (Sandbox Code Playgroud)

要么

pd.concat(df_list).groupby(level=0).last()

     Tx-TCP Tx-UDP
SS_0    NaN   1.11
SS_1   3.33   2.22
Run Code Online (Sandbox Code Playgroud)

实验A

from functools import reduce

idx = reduce(pd.Index.union, [d.index for d in df_list])
col = reduce(pd.Index.union, [d.columns for d in df_list])
tmp = pd.DataFrame(index=idx, columns=col)
reduce(pd.DataFrame.fillna, [tmp] + df_list)

     Tx-TCP Tx-UDP
SS_0    NaN   1.11
SS_1   3.33   2.22
Run Code Online (Sandbox Code Playgroud)

实验B

from functools import reduce

idx = reduce(pd.Index.union, [d.index for d in df_list])
col = reduce(pd.Index.union, [d.columns for d in df_list])
tmp = pd.DataFrame(index=idx, columns=col)
[tmp.update(d) for d in df_list];
tmp

     Tx-TCP Tx-UDP
SS_0    NaN   1.11
SS_1   3.33   2.22
Run Code Online (Sandbox Code Playgroud)