cod*_*nob 11 python stack pivot pandas
我有一个如下所示的数据框:
import pandas as pd
datelisttemp = pd.date_range('1/1/2014', periods=3, freq='D')
s = list(datelisttemp)*3
s.sort()
df = pd.DataFrame({'BORDER':['GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY' ], 'HOUR1':[2 ,2 ,2 ,4 ,4 ,4 ,6 ,6, 6],'HOUR2':[3 ,3 ,3, 5 ,5 ,5, 7, 7, 7], 'HOUR3':[8 ,8 ,8, 12 ,12 ,12, 99, 99, 99]}, index=s)
Run Code Online (Sandbox Code Playgroud)
这给了我:
Out[458]: df
BORDER HOUR1 HOUR2 HOUR3
2014-01-01 GERMANY 2 3 8
2014-01-01 FRANCE 2 3 8
2014-01-01 ITALY 2 3 8
2014-01-02 GERMANY 4 5 12
2014-01-02 FRANCE 4 5 12
2014-01-02 ITALY 4 5 12
2014-01-03 GERMANY 6 7 99
2014-01-03 FRANCE 6 7 99
2014-01-03 ITALY 6 7 99
Run Code Online (Sandbox Code Playgroud)
我希望最终的数据框看起来像:
HOUR GERMANY FRANCE ITALY
2014-01-01 1 2 2 2
2014-01-01 2 3 3 3
2014-01-01 3 8 8 8
2014-01-02 1 4 4 4
2014-01-02 2 5 5 5
2014-01-02 3 12 12 12
2014-01-03 1 6 6 6
2014-01-03 2 7 7 7
2014-01-03 3 99 99 99
Run Code Online (Sandbox Code Playgroud)
我做了以下但是我不在那里:
df['date_col'] = df.index
df2 = melt(df, id_vars=['date_col','BORDER'])
#Can I keep the same index after melt or do I have to set an index like below?
df2.set_index(['date_col', 'variable'], inplace=True, drop=True)
df2 = df2.sort()
Run Code Online (Sandbox Code Playgroud)
DF
Out[465]: df2
BORDER value
date_col variable
2014-01-01 HOUR1 GERMANY 2
HOUR1 FRANCE 2
HOUR1 ITALY 2
HOUR2 GERMANY 3
HOUR2 FRANCE 3
HOUR2 ITALY 3
HOUR3 GERMANY 8
HOUR3 FRANCE 8
HOUR3 ITALY 8
2014-01-02 HOUR1 GERMANY 4
HOUR1 FRANCE 4
HOUR1 ITALY 4
HOUR2 GERMANY 5
HOUR2 FRANCE 5
HOUR2 ITALY 5
HOUR3 GERMANY 12
HOUR3 FRANCE 12
HOUR3 ITALY 12
2014-01-03 HOUR1 GERMANY 6
HOUR1 FRANCE 6
HOUR1 ITALY 6
HOUR2 GERMANY 7
HOUR2 FRANCE 7
HOUR2 ITALY 7
HOUR3 GERMANY 99
HOUR3 FRANCE 99
HOUR3 ITALY 99
Run Code Online (Sandbox Code Playgroud)
我以为我可以拆开df2来获得类似于我最终数据帧的东西,但是我得到了各种各样的错误.我也尝试过调整这个数据框但是不能完全得到我想要的.
unu*_*tbu 18
我们希望值(例如'GERMANY')成为列名,列名(例如'HOUR1')成为值 - 排序的交换.
该stack方法将列名称转换为索引值,该unstack方法将索引值转换为列名称.
因此,通过将值移动到索引中,我们可以使用stack和unstack执行交换.
import pandas as pd
datelisttemp = pd.date_range('1/1/2014', periods=3, freq='D')
s = list(datelisttemp)*3
s.sort()
df = pd.DataFrame({'BORDER':['GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY' ], 'HOUR1':[2 ,2 ,2 ,4 ,4 ,4 ,6 ,6, 6],'HOUR2':[3 ,3 ,3, 5 ,5 ,5, 7, 7, 7], 'HOUR3':[8 ,8 ,8, 12 ,12 ,12, 99, 99, 99]}, index=s)
df = df.set_index(['BORDER'], append=True)
df.columns.name = 'HOUR'
df = df.unstack('BORDER')
df = df.stack('HOUR')
df = df.reset_index('HOUR')
df['HOUR'] = df['HOUR'].str.replace('HOUR', '').astype('int')
print(df)
Run Code Online (Sandbox Code Playgroud)
产量
BORDER HOUR FRANCE GERMANY ITALY
2014-01-01 1 2 2 2
2014-01-01 2 3 3 3
2014-01-01 3 8 8 8
2014-01-02 1 4 4 4
2014-01-02 2 5 5 5
2014-01-02 3 12 12 12
2014-01-03 1 6 6 6
2014-01-03 2 7 7 7
2014-01-03 3 99 99 99
Run Code Online (Sandbox Code Playgroud)
使用您的df2:
>>> df2.pivot_table(values='value', index=['DATE', 'variable'], columns="BORDER")
BORDER FRANCE GERMANY ITALY
DATE variable
2014-01-01 HOUR1 2 2 2
HOUR2 3 3 3
HOUR3 8 8 8
2014-01-02 HOUR1 4 4 4
HOUR2 5 5 5
HOUR3 12 12 12
2014-01-03 HOUR1 6 6 6
HOUR2 7 7 7
HOUR3 99 99 99
[9 rows x 3 columns]
Run Code Online (Sandbox Code Playgroud)
如果您想将索引级别“变量”转换为名为“HOUR”的列并从值中删除文本“HOUR”,则仍然需要进行一些清理工作,但我认为这是您想要的基本格式。
| 归档时间: |
|
| 查看次数: |
23551 次 |
| 最近记录: |