jon*_*ony 10 python dataframe pandas
这是我的数据框:
Dec-18 Jan-19 Feb-19 Mar-19 Apr-19 May-19
Saturday 2540.0 2441.0 3832.0 4093.0 1455.0 2552.0
Sunday 1313.0 1891.0 2968.0 2260.0 1454.0 1798.0
Monday 1360.0 1558.0 2967.0 2156.0 1564.0 1752.0
Tuesday 1089.0 2105.0 2476.0 1577.0 1744.0 1457.0
Wednesday 1329.0 1658.0 2073.0 2403.0 1231.0 874.0
Thursday 798.0 1195.0 2183.0 1287.0 1460.0 1269.0
Run Code Online (Sandbox Code Playgroud)
我尝试了一些熊猫行动,但我无法做到。
这就是我想做的:
items
Saturday 2540.0
Sunday 1313.0
Monday 1360.0
Tuesday 1089.0
Wednesday 1329.0
Thursday 798.0
Saturday 2441.0
Sunday 1891.0
Monday 1558.0
Tuesday 2105.0
Wednesday 1658.0
Thursday 1195.0 ............ and so on
Run Code Online (Sandbox Code Playgroud)
我想将这些行设置为不利的行,该怎么做?
df.reset_index().melt(id_vars='index').drop('variable',1)
Run Code Online (Sandbox Code Playgroud)
输出:
index value
0 Saturday 2540.0
1 Sunday 1313.0
2 Monday 1360.0
3 Tuesday 1089.0
4 Wednesday 1329.0
5 Thursday 798.0
6 Saturday 2441.0
7 Sunday 1891.0
8 Monday 1558.0
9 Tuesday 2105.0
10 Wednesday 1658.0
11 Thursday 1195.0
12 Saturday 3832.0
13 Sunday 2968.0
14 Monday 2967.0
15 Tuesday 2476.0
16 Wednesday 2073.0
17 Thursday 2183.0
18 Saturday 4093.0
19 Sunday 2260.0
20 Monday 2156.0
21 Tuesday 1577.0
22 Wednesday 2403.0
23 Thursday 1287.0
24 Saturday 1455.0
25 Sunday 1454.0
26 Monday 1564.0
27 Tuesday 1744.0
28 Wednesday 1231.0
29 Thursday 1460.0
30 Saturday 2552.0
31 Sunday 1798.0
32 Monday 1752.0
33 Tuesday 1457.0
34 Wednesday 874.0
35 Thursday 1269.0
Run Code Online (Sandbox Code Playgroud)
注意:刚刚注意到有人建议做同样的事情,如果需要,我会删除我的帖子:)
numpy通过重塑数据来创建它。
import pandas as pd
import numpy as np
pd.DataFrame(df.to_numpy().flatten('F'),
index=np.tile(df.index, df.shape[1]),
columns=['items'])
Run Code Online (Sandbox Code Playgroud)
items
Saturday 2540.0
Sunday 1313.0
Monday 1360.0
Tuesday 1089.0
Wednesday 1329.0
Thursday 798.0
Saturday 2441.0
...
Sunday 1798.0
Monday 1752.0
Tuesday 1457.0
Wednesday 874.0
Thursday 1269.0
Run Code Online (Sandbox Code Playgroud)
你可以做:
df = df.stack().sort_index(level=1).reset_index(level = 1, drop=True).to_frame('items')
Run Code Online (Sandbox Code Playgroud)
有趣的是,尽管这种方法是最快的,但它却被忽视了:
import time
start = time.time()
df.stack().sort_index(level=1).reset_index(level = 1, drop=True).to_frame('items')
end = time.time()
print("time taken {}".format(end-start))
Run Code Online (Sandbox Code Playgroud)
产量:time taken 0.006181955337524414
而这个:
start = time.time()
df.reset_index().melt(id_vars='days').drop('variable',1)
end = time.time()
print("time taken {}".format(end-start))
Run Code Online (Sandbox Code Playgroud)
产量:time taken 0.010072708129882812
我的任何输出格式都与OP的要求完全匹配。
| 归档时间: |
|
| 查看次数: |
585 次 |
| 最近记录: |