pandas在两个数据帧中进行数据透视和连接

Mat*_*ron 10 python pivot join dataframe pandas

我有两个dataFrames:

df1
   mag   cat
0  101   A1
1  256   A2  
2  760   A2
3  888   A3  
...

df2
   A1    A2    A3    ...
0  E50R  AZ33  REZ3 
1  T605  YYU6  YHG5
2  IR50  P0O9  BF53
3  NaN   YY9I  NaN
Run Code Online (Sandbox Code Playgroud)

我想创建一个最终的DataFrame,它看起来像:

df
   101   256   760   888  ...
0  E50R  AZ33  AZ33  REZ3
1  T605  YYU6  YYU6  YHG5
2  IR50  P0O9  P0O9  BF53
3  NaN   YY9I  YY9I  NaN
Run Code Online (Sandbox Code Playgroud)

我尝试了一些带有枢轴的东西,但它似乎没有完成这项工作你能帮助我吗?

WeN*_*Ben 8

IIUC reindex+重新命名

newdf=df2.reindex(columns=df1.cat)
newdf.columns=df1.mag
newdf
Out[519]: 
mag   101   256   760   888
0    E50R  AZ33  AZ33  REZ3
1    T605  YYU6  YYU6  YHG5
2    IR50  P0O9  P0O9  BF53
3     NaN  YY9I  YY9I   NaN
Run Code Online (Sandbox Code Playgroud)


jpp*_*jpp 5

您可以使用的组合GroupBy,numpy.repeat,itertools.chain:

from itertools import chain

# map cat to list of mag
s = df1.groupby('cat')['mag'].apply(list)

# calculate indices for columns, including repeats
cols_idx = np.repeat(range(len(df2.columns)), s.map(len))

# apply indexing
res = df2.iloc[:, cols_idx]

# rename columns
res.columns = list(chain.from_iterable(df2.columns.map(s.get)))

print(res)

    101   256   760   888
0  E50R  AZ33  AZ33  REZ3
1  T605  YYU6  YYU6  YHG5
2  IR50  P0O9  P0O9  BF53
3   NaN  YY9I  YY9I   NaN
Run Code Online (Sandbox Code Playgroud)

绩效基准

这里有一些好的和不同的解决方案,所以你可能对性能感兴趣.温的reindex解决方案是明显的赢家.

%timeit wen(df1, df2)   # 632 µs per loop
%timeit jpp(df1, df2)   # 2.55 ms per loop
%timeit scb(df1, df2)   # 7.98 ms per loop
%timeit abhi(df1, df2)  # 4.52 ms per loop
Run Code Online (Sandbox Code Playgroud)

码:

def jpp(df1, df2):
    s = df1.groupby('cat')['mag'].apply(list)
    cols_idx = np.repeat(range(len(df2.columns)), s.map(len))
    res = df2.iloc[:, cols_idx]
    res.columns = list(chain.from_iterable(df2.columns.map(s.get)))    
    return res

def scb(df1, df2):
    df_out = (df2.stack().reset_index()
                 .merge(df1, left_on='level_1', right_on='cat')[['level_0','mag',0]])
    return df_out.pivot('level_0','mag',0).reset_index(drop=True)    

def abhi(df1, df2):
    return df2.T.merge(df1, left_index=True, right_on='cat').drop('cat', axis=1).set_index('mag').T

def wen(df1, df2):
    newdf=df2.reindex(columns=df1.cat)
    newdf.columns=df1.mag
    return newdf
Run Code Online (Sandbox Code Playgroud)