Mat*_*ron 10 python pivot join dataframe pandas
我有两个dataFrames:
df1
mag cat
0 101 A1
1 256 A2
2 760 A2
3 888 A3
...
df2
A1 A2 A3 ...
0 E50R AZ33 REZ3
1 T605 YYU6 YHG5
2 IR50 P0O9 BF53
3 NaN YY9I NaN
Run Code Online (Sandbox Code Playgroud)
我想创建一个最终的DataFrame,它看起来像:
df
101 256 760 888 ...
0 E50R AZ33 AZ33 REZ3
1 T605 YYU6 YYU6 YHG5
2 IR50 P0O9 P0O9 BF53
3 NaN YY9I YY9I NaN
Run Code Online (Sandbox Code Playgroud)
我尝试了一些带有枢轴的东西,但它似乎没有完成这项工作你能帮助我吗?
IIUC reindex
+重新命名
newdf=df2.reindex(columns=df1.cat)
newdf.columns=df1.mag
newdf
Out[519]:
mag 101 256 760 888
0 E50R AZ33 AZ33 REZ3
1 T605 YYU6 YYU6 YHG5
2 IR50 P0O9 P0O9 BF53
3 NaN YY9I YY9I NaN
Run Code Online (Sandbox Code Playgroud)
您可以使用的组合GroupBy
,numpy.repeat
,itertools.chain
:
from itertools import chain
# map cat to list of mag
s = df1.groupby('cat')['mag'].apply(list)
# calculate indices for columns, including repeats
cols_idx = np.repeat(range(len(df2.columns)), s.map(len))
# apply indexing
res = df2.iloc[:, cols_idx]
# rename columns
res.columns = list(chain.from_iterable(df2.columns.map(s.get)))
print(res)
101 256 760 888
0 E50R AZ33 AZ33 REZ3
1 T605 YYU6 YYU6 YHG5
2 IR50 P0O9 P0O9 BF53
3 NaN YY9I YY9I NaN
Run Code Online (Sandbox Code Playgroud)
绩效基准
这里有一些好的和不同的解决方案,所以你可能对性能感兴趣.温的reindex
解决方案是明显的赢家.
%timeit wen(df1, df2) # 632 µs per loop
%timeit jpp(df1, df2) # 2.55 ms per loop
%timeit scb(df1, df2) # 7.98 ms per loop
%timeit abhi(df1, df2) # 4.52 ms per loop
Run Code Online (Sandbox Code Playgroud)
码:
def jpp(df1, df2):
s = df1.groupby('cat')['mag'].apply(list)
cols_idx = np.repeat(range(len(df2.columns)), s.map(len))
res = df2.iloc[:, cols_idx]
res.columns = list(chain.from_iterable(df2.columns.map(s.get)))
return res
def scb(df1, df2):
df_out = (df2.stack().reset_index()
.merge(df1, left_on='level_1', right_on='cat')[['level_0','mag',0]])
return df_out.pivot('level_0','mag',0).reset_index(drop=True)
def abhi(df1, df2):
return df2.T.merge(df1, left_index=True, right_on='cat').drop('cat', axis=1).set_index('mag').T
def wen(df1, df2):
newdf=df2.reindex(columns=df1.cat)
newdf.columns=df1.mag
return newdf
Run Code Online (Sandbox Code Playgroud)