将pandas列值转换为行

Question

将pandas列值转换为行

我正在尝试将数据帧转换为长格式.

我开始的数据框:

df = pd.DataFrame([['a', 'b'],
                   ['d', 'e'], 
                   ['f', 'g', 'h'],
                   ['q', 'r', 'e', 't']])
df = df.rename(columns={0: "Key"})

    Key 1   2   3
0   a   b   None    None
1   d   e   None    None
2   f   g   h       None
3   q   r   e       t

Run Code Online (Sandbox Code Playgroud)

未指定列数,可能超过4.键后面的每个值应该有一个新行

这得到了我需要的东西,然而,似乎应该有一种方法来做到这一点,而不必删除空值:

new_df = pd.melt(df, id_vars=['Key'])[['Key', 'value']]
new_df = new_df.dropna()


    Key value
0   a   b
1   d   e
2   f   g
3   q   r
6   f   h
7   q   e
11  q   t?

Run Code Online (Sandbox Code Playgroud)

Answer 1

cs9*_*s95 5

选项1
您应该可以使用set_index+ 执行此操作stack:

df.set_index('Key').stack().reset_index(level=0, name='value').reset_index(drop=True)

  Key value
0   a     b
1   d     e
2   f     g
3   f     h
4   q     r
5   q     s
6   q     t

Run Code Online (Sandbox Code Playgroud)

如果您不想继续重置索引,请使用中间变量并创建新的DataFrame:

v = df.set_index('Key').stack()
pd.DataFrame({'Key' : v.index.get_level_values(0), 'value' : v.values})

Run Code Online (Sandbox Code Playgroud)

  Key value
0   a     b
1   d     e
2   f     g
3   f     h
4   q     r
5   q     s
6   q     t

Run Code Online (Sandbox Code Playgroud)

这里的本质是默认情况下stack自动删除NaN(你可以通过设置禁用它dropna=False).

选项2
使用np.repeat和numpy版本的更多性能pd.DataFrame.stack:

i = df.pop('Key').values
j = df.values.ravel()

pd.DataFrame({'Key' : v.repeat(df.count(axis=1)), 'value' : j[pd.notnull(j)]
})

  Key value
0   a     b
1   d     e
2   f     g
3   f     h
4   q     r
5   q     s
6   q     t

Run Code Online (Sandbox Code Playgroud)

Answer 2

WeN*_*Ben 5

通过使用melt(我不认为dropna在这里创造更多'麻烦')

df.melt('Key').dropna().drop('variable',1)
Out[809]: 
   Key value
0    a     b
1    d     e
2    f     g
3    q     r
6    f     h
7    q     s
11   q     t

Run Code Online (Sandbox Code Playgroud)

如果没有 dropna

s=df.fillna('').set_index('Key').sum(1).apply(list)
pd.DataFrame({'Key': s.reindex(s.index.repeat(s.str.len())).index,'value':s.sum()})


Out[862]: 
  Key value
0   a     b
1   d     e
2   f     g
3   f     h
4   q     r
5   q     s
6   q     t

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，11 月前
查看次数：	1005 次
最近记录：	7 年，11 月前