如何在熊猫中设置列

jon*_*ony 10 python dataframe pandas

这是我的数据框:

            Dec-18  Jan-19  Feb-19  Mar-19  Apr-19  May-19
Saturday    2540.0  2441.0  3832.0  4093.0  1455.0  2552.0
Sunday      1313.0  1891.0  2968.0  2260.0  1454.0  1798.0
Monday      1360.0  1558.0  2967.0  2156.0  1564.0  1752.0
Tuesday     1089.0  2105.0  2476.0  1577.0  1744.0  1457.0
Wednesday   1329.0  1658.0  2073.0  2403.0  1231.0  874.0
Thursday    798.0   1195.0  2183.0  1287.0  1460.0  1269.0
Run Code Online (Sandbox Code Playgroud)

我尝试了一些熊猫行动,但我无法做到。

这就是我想做的:

             items
Saturday    2540.0  
Sunday      1313.0  
Monday      1360.0  
Tuesday     1089.0  
Wednesday   1329.0  
Thursday    798.0   
Saturday    2441.0  
Sunday      1891.0  
Monday      1558.0  
Tuesday     2105.0  
Wednesday   1658.0  
Thursday    1195.0   ............ and so on 
Run Code Online (Sandbox Code Playgroud)

我想将这些行设置为不利的行,该怎么做?

Yuc*_*uca 9

df.reset_index().melt(id_vars='index').drop('variable',1)
Run Code Online (Sandbox Code Playgroud)

输出:

       index   value
0    Saturday  2540.0
1      Sunday  1313.0
2      Monday  1360.0
3     Tuesday  1089.0
4   Wednesday  1329.0
5    Thursday   798.0
6    Saturday  2441.0
7      Sunday  1891.0
8      Monday  1558.0
9     Tuesday  2105.0
10  Wednesday  1658.0
11   Thursday  1195.0
12   Saturday  3832.0
13     Sunday  2968.0
14     Monday  2967.0
15    Tuesday  2476.0
16  Wednesday  2073.0
17   Thursday  2183.0
18   Saturday  4093.0
19     Sunday  2260.0
20     Monday  2156.0
21    Tuesday  1577.0
22  Wednesday  2403.0
23   Thursday  1287.0
24   Saturday  1455.0
25     Sunday  1454.0
26     Monday  1564.0
27    Tuesday  1744.0
28  Wednesday  1231.0
29   Thursday  1460.0
30   Saturday  2552.0
31     Sunday  1798.0
32     Monday  1752.0
33    Tuesday  1457.0
34  Wednesday   874.0
35   Thursday  1269.0
Run Code Online (Sandbox Code Playgroud)

注意:刚刚注意到有人建议做同样的事情,如果需要,我会删除我的帖子:)


ALo*_*llz 8

numpy通过重塑数据来创建它。

import pandas as pd
import numpy as np

pd.DataFrame(df.to_numpy().flatten('F'), 
             index=np.tile(df.index, df.shape[1]), 
             columns=['items'])
Run Code Online (Sandbox Code Playgroud)

输出:

            items
Saturday   2540.0
Sunday     1313.0
Monday     1360.0
Tuesday    1089.0
Wednesday  1329.0
Thursday    798.0
Saturday   2441.0
...
Sunday     1798.0
Monday     1752.0
Tuesday    1457.0
Wednesday   874.0
Thursday   1269.0
Run Code Online (Sandbox Code Playgroud)

  • 较小的修正:np.tile的参数应该是df.shape [1]而不是df.shape [0],这仅适用于此示例数据,因为它是方形的! (2认同)

d_k*_*etz 5

你可以做:

df = df.stack().sort_index(level=1).reset_index(level = 1, drop=True).to_frame('items')
Run Code Online (Sandbox Code Playgroud)

有趣的是,尽管这种方法是最快的,但它却被忽视了:

import time
start = time.time()
df.stack().sort_index(level=1).reset_index(level = 1, drop=True).to_frame('items')
end = time.time()
print("time taken {}".format(end-start))
Run Code Online (Sandbox Code Playgroud)

产量:time taken 0.006181955337524414

而这个:

start = time.time()
df.reset_index().melt(id_vars='days').drop('variable',1)
end = time.time()
print("time taken {}".format(end-start))
Run Code Online (Sandbox Code Playgroud)

产量:time taken 0.010072708129882812

我的任何输出格式都与OP的要求完全匹配。

  • @d_kennetz 有时他们不这样做。我通常将答案视为一般想法。我据此判断他们。我赞扬独创性和演示/解释。我喜欢看到建议的解决方案的输出**因为**所有答案通常提供的解决方案不会产生正确的输出。这不显示结果。此外,大多数时候,DataFrame 的大小不足以影响性能。OP 会选择他们最容易理解的内容。继续努力并回答对长期有益的问题。(-: (2认同)