FFL*_*L75 10 python dataframe pandas
我的数据框看起来像这样:
pd.DataFrame([["t1","d2","e3","r4"],
["t1","d2","e2","r4"],
["t1","d2","e1","r4"]],columns=["a","b","c","d"])
Run Code Online (Sandbox Code Playgroud)
而且我要:
pd.DataFrame([["t1","d2","e3","r4","e1","e2"]],
columns=["a","b","c","d","c1","c2"])
Run Code Online (Sandbox Code Playgroud)
即我只有1列值不同,我想创建一个新的数据帧,在观察到新值时添加列.是否有捷径可寻 ?
Ucols = df.columns[(df.nunique() == 1)].tolist()
df_out = df.set_index(Ucols).set_index(df.groupby(Ucols).cumcount(), append=True).unstack()
df_out.columns = [f'{i}{j}' if j != 0 else f'{i}' for i,j in df_out.columns]
print(df_out.reset_index())
Run Code Online (Sandbox Code Playgroud)
输出:
a b d c c1 c2
0 t1 d2 r4 e3 e2 e1
Run Code Online (Sandbox Code Playgroud)
使用:
df_out = df.set_index(['a','b','d',df.groupby(['a','b','d']).cumcount()]).unstack()
df_out.columns = [f'{i}{j}' if j != 0 else f'{i}' for i,j in df_out.columns]
df_out.reset_index()
Run Code Online (Sandbox Code Playgroud)
输出:
a b d c c1 c2
0 t1 d2 r4 e3 e2 e1
Run Code Online (Sandbox Code Playgroud)
您可以使用字典理解.为了保持一致性,我在所有列上都包含了整数标记.
res = pd.DataFrame({f'{col}{idx}': val for col in df for idx, val in \
enumerate(df[col].unique(), 1)}, index=[0])
print(res)
a1 b1 c1 c2 c3 d1
0 t1 d2 e3 e2 e1 r4
Run Code Online (Sandbox Code Playgroud)
替代方案df[col].unique()是df[col].drop_duplicates(),尽管后者可能会产生迭代pd.Series对象的开销np.ndarray.
| 归档时间: |
|
| 查看次数: |
501 次 |
| 最近记录: |