将 Pandas 数据框列转换为根据行中的数字列出

Tha*_*bra 3 python transformation dataframe pandas

我有一个这样的数据框:

Day            Id   Banana  Apple 
2020-01-01     1    1       1
2020-01-02     1    NaN     2
2020-01-03     2    2       2
Run Code Online (Sandbox Code Playgroud)

我怎样才能将它转换为:

Day            Id   Banana  Apple  Products
2020-01-01     1    1       1      [Banana, Apple]
2020-01-02     1    NaN     2      [Apple, Apple]
2020-01-03     2    2       2      [Banana, Banana, Apple, Apple]
Run Code Online (Sandbox Code Playgroud)

jez*_*ael 5

选择所有没有第一个 2 by 位置的列 by DataFrame.iloc,然后重塑 by DataFrame.stack,重复MultiIndexbyIndex.repeat和聚合lists:

s = df.iloc[:, 2:].stack()
df['Products'] = s[s.index.repeat(s)].reset_index().groupby(['level_0'])['level_1'].agg(list)
print (df)
          Day  Id  Banana  Apple                        Products
0  2020-01-01   1     1.0      1                 [Banana, Apple]
1  2020-01-02   1     NaN      2                  [Apple, Apple]
2  2020-01-03   2     2.0      2  [Banana, Banana, Apple, Apple]
Run Code Online (Sandbox Code Playgroud)

或者使用带有重复columns名称的自定义函数而不会丢失值:

def f(x):
    s = x.dropna()
    return s.index.repeat(s).tolist()

df['Products'] = df.iloc[:, 2:].apply(f, axis=1)
print (df)
          Day  Id  Banana  Apple                        Products
0  2020-01-01   1     1.0      1                 [Banana, Apple]
1  2020-01-02   1     NaN      2                  [Apple, Apple]
2  2020-01-03   2     2.0      2  [Banana, Banana, Apple, Apple]
Run Code Online (Sandbox Code Playgroud)