sha*_*aia 4 python matplotlib dataframe pandas pandas-groupby
我有一些数据,我想从中提取不同产品(和)的收入时间序列(Dollars
不同Day
地点不同日期的总和) 。Where
x
y
import pandas as pd
#Create data
data = {'Day': [1,1,2,2,3,3],
'Where': ['A','B','A','B','B','B'],
'What': ['x','y','x','x','x','y'],
'Dollars': [100,200,100,100,100,200]}
index = range(len(data['Day']))
columns = ['Day','Where','What','Dollars']
df = pd.DataFrame(data, index=index, columns=columns)
df
Run Code Online (Sandbox Code Playgroud)
为此,我将数据按Day
和分组What
并求和Dollars
:
#Group by Day and What and sum Dollars (for each Where)
print(df.groupby(['Day', 'What'])['Dollars'].sum())
Run Code Online (Sandbox Code Playgroud)
现在,我想制作一个时间序列,x
如下y
所示:
我尝试了以下方法,但显然不起作用:
items = df.What.unique()
ax = plt.figure()
for item in items:
df_tmp = df[['Day']][df.What == item]
plt.plot(df_tmp['Day'],df_tmp,'.-',label=item)
Run Code Online (Sandbox Code Playgroud)
有人可以帮我指明正确的方向吗?有没有更快的方法来获得正确的结果?
IIUC、unstack
和绘图:
(df.groupby(['Day', 'What'])['Dollars']
.sum()
.unstack('What', fill_value=0)
.plot())
plt.show()
Run Code Online (Sandbox Code Playgroud)