efs*_*see 4 group-by pivot-table dataframe python-3.x pandas
我有一个像这样的数据框:
customer_id | date | category
1 | 2017-2-1 | toys
2 | 2017-2-1 | food
1 | 2017-2-1 | drinks
3 | 2017-2-2 | computer
2 | 2017-2-1 | toys
1 | 2017-3-1 | food
>>> import pandas as pd
>>> dt = dict(customer_id=[1,2,1,3,2,1],
date='2017-2-1 2017-2-1 2017-2-1 2017-2-2 2017-2-1 2017-3-1'.split(),
category=["toys", "food", "drinks", "computer", "toys", "food"]))
>>> df = pd.DataFrame(dt)
Run Code Online (Sandbox Code Playgroud)
使用我的新列和对这些列进行热编码,我知道我可以使用df.pivot_table(index = ['customer_id'], columns = ['category']).
>>> df['Indicator'] = 1
>>> df.pivot_table(index=['customer_id'], columns=['category'],
values='Indicator').fillna(0).astype(int)
category computer drinks food toys
customer_id
1 0 1 1 1
2 0 0 1 1
3 1 0 0 0
>>>
Run Code Online (Sandbox Code Playgroud)
我还想进行分组,date以便每一行仅包含同一日期的信息,如下面所需的输出所示,id 1 有两行,因为该date列中有两个唯一的日期。
customer_id | toys | food | drinks | computer
1 | 1 | 0 | 1 | 0
1 | 0 | 1 | 0 | 0
2 | 1 | 1 | 0 | 0
3 | 0 | 0 | 0 | 1
Run Code Online (Sandbox Code Playgroud)
您可能正在寻找crosstab
>>> pd.crosstab([df.customer_id,df.date], df.category)
category computer drinks food toys
customer_id date
1 2017-2-1 0 1 0 1
2017-3-1 0 0 1 0
2 2017-2-1 0 0 1 1
3 2017-2-2 1 0 0 0
>>>
>>> pd.crosstab([df.customer_id,df.date],
df.category).reset_index(level=1)
category date computer drinks food toys
customer_id
1 2017-2-1 0 1 0 1
1 2017-3-1 0 0 1 0
2 2017-2-1 0 0 1 1
3 2017-2-2 1 0 0 0
>>>
>>> pd.crosstab([df.customer_id, df.date],
df.category).reset_index(level=1, drop=True)
category computer drinks food toys
customer_id
1 0 1 0 1
1 0 0 1 0
2 0 0 1 1
3 1 0 0 0
>>>
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5219 次 |
| 最近记录: |