Chr*_*her 1 python group-by multi-index dataframe pandas
我想将以下输出转换为:
删除多索引(它应该只是一行索引)
相应编号工作1,工作生效日期1,工作2,工作生效日期2等.
如果我选择添加或删除其他变量,我希望这是可扩展的,我不想修改代码以适应它(这是示例缩小).
一些数据:
import pandas as pd
import numpy as np
data1 = {'Name': ["Joe", "Joe", "Joe","Jane","Jane"],
'Job': ["Analyst","Manager","Director","Analyst","Manager"],
'Job Eff Date': ["1/1/2015","1/1/2016","7/1/2016","1/1/2015","1/1/2016"]}
df2 = pd.DataFrame(data1, columns=['Name', 'Job', 'Job Eff Date'])
def tgrp(df):
df = df.drop('Name', axis=1)
return df.reset_index(drop=True).T
df2.groupby('Name').apply(tgrp).unstack()
Run Code Online (Sandbox Code Playgroud)
尝试:
df3.columns = ['{} {}'.format(col[1], col[0]) for col in df3.columns]
Run Code Online (Sandbox Code Playgroud)
如果你没有基于0的索引.否则改为col[0] + 1
另一个解决方案join
:
df.columns = [' '.join((col[1], str(col[0] + 1))) for col in df.columns]
print (df)
Job 1 Job Eff Date 1 Job 2 Job Eff Date 2 Job 3 Job Eff Date 3
Name
Jane Analyst 1/1/2015 Manager 1/1/2016 NaN NaN
Joe Analyst 1/1/2015 Manager 1/1/2016 Director 7/1/2016
Run Code Online (Sandbox Code Playgroud)
如果需要删除索引名称,请使用rename_axis
(new in pandas
0.18.0
):
df.columns = [' '.join((col[1], str(col[0] + 1))) for col in df.columns]
df = df.rename_axis(None)
print (df)
Job 1 Job Eff Date 1 Job 2 Job Eff Date 2 Job 3 Job Eff Date 3
Jane Analyst 1/1/2015 Manager 1/1/2016 NaN NaN
Joe Analyst 1/1/2015 Manager 1/1/2016 Director 7/1/2016
Run Code Online (Sandbox Code Playgroud)
它是如何工作的:
列表推导转换MultiIndex
为list
of tuples
,由 连接join
,但首先必须添加1
并转换int
为str
元组的每个第一项:
print ([col for col in df.columns])
[(0, 'Job'), (0, 'Job Eff Date'),
(1, 'Job'), (1, 'Job Eff Date'),
(2, 'Job'), (2, 'Job Eff Date')]
Run Code Online (Sandbox Code Playgroud)
输出是字符串列表,分配给列名:
print ([' '.join((col[1], str(col[0] + 1))) for col in df.columns])
['Job 1', 'Job Eff Date 1', 'Job 2', 'Job Eff Date 2', 'Job 3', 'Job Eff Date 3']
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
549 次 |
最近记录: |