仅在值不同时才创建新列

Question

仅在值不同时才创建新列

我的数据框看起来像这样:

pd.DataFrame([["t1","d2","e3","r4"],
         ["t1","d2","e2","r4"],
         ["t1","d2","e1","r4"]],columns=["a","b","c","d"])

Run Code Online (Sandbox Code Playgroud)

而且我要:

pd.DataFrame([["t1","d2","e3","r4","e1","e2"]],
columns=["a","b","c","d","c1","c2"])

Run Code Online (Sandbox Code Playgroud)

即我只有1列值不同,我想创建一个新的数据帧,在观察到新值时添加列.是否有捷径可寻？

Answer 1

Sco*_*ton 7

编辑:要概括任何单个非唯一列:

Ucols = df.columns[(df.nunique() == 1)].tolist()
df_out = df.set_index(Ucols).set_index(df.groupby(Ucols).cumcount(), append=True).unstack()
df_out.columns = [f'{i}{j}' if j != 0 else f'{i}' for i,j in df_out.columns]
print(df_out.reset_index())

Run Code Online (Sandbox Code Playgroud)

输出:

    a   b   d   c  c1  c2
0  t1  d2  r4  e3  e2  e1

Run Code Online (Sandbox Code Playgroud)

原始答案

使用:

df_out = df.set_index(['a','b','d',df.groupby(['a','b','d']).cumcount()]).unstack()

df_out.columns = [f'{i}{j}' if j != 0 else f'{i}' for i,j in df_out.columns]

df_out.reset_index()

Run Code Online (Sandbox Code Playgroud)

输出:

    a   b   d   c  c1  c2
0  t1  d2  r4  e3  e2  e1

Run Code Online (Sandbox Code Playgroud)

在我看来,这并没有概括.我的意思是你已经知道列`c`有更多的值. (2认同)

Answer 2

jpp*_*jpp 6

您可以使用字典理解.为了保持一致性,我在所有列上都包含了整数标记.

res = pd.DataFrame({f'{col}{idx}': val for col in df for idx, val in \
                    enumerate(df[col].unique(), 1)}, index=[0])

print(res)

   a1  b1  c1  c2  c3  d1
0  t1  d2  e3  e2  e1  r4

Run Code Online (Sandbox Code Playgroud)

替代方案df[col].unique()是df[col].drop_duplicates(),尽管后者可能会产生迭代pd.Series对象的开销np.ndarray.

归档时间：	7 年，10 月前
查看次数：	501 次
最近记录：	7 年，10 月前