我有以下数据帧:
Date abc xyz
01-Jun-13 100 200
03-Jun-13 -20 50
15-Aug-13 40 -5
20-Jan-14 25 15
21-Feb-14 60 80
Run Code Online (Sandbox Code Playgroud)
我需要按年份和月份对数据进行分组.即:2013年1月,2013年2月,2013年3月等组...我将使用新分组的数据创建一个显示每年/每月abc vs xyz的图表.
我已经尝试过groupby和sum的各种组合,但似乎无法获得任何工作.
谢谢你的帮助.
And*_*den 78
您可以使用重新取样或Grouper(在引擎盖下重新取样).
首先确保datetime列实际上是datetimes(点击它pd.to_datetime).如果它是DatetimeIndex会更容易:
In [11]: df1
Out[11]:
abc xyz
Date
2013-06-01 100 200
2013-06-03 -20 50
2013-08-15 40 -5
2014-01-20 25 15
2014-02-21 60 80
In [12]: g = df1.groupby(pd.Grouper(freq="M")) # DataFrameGroupBy (grouped by Month)
In [13]: g.sum()
Out[13]:
abc xyz
Date
2013-06-30 80 250
2013-07-31 NaN NaN
2013-08-31 40 -5
2013-09-30 NaN NaN
2013-10-31 NaN NaN
2013-11-30 NaN NaN
2013-12-31 NaN NaN
2014-01-31 25 15
2014-02-28 60 80
In [14]: df1.resample("M", how='sum') # the same
Out[14]:
abc xyz
Date
2013-06-30 40 125
2013-07-31 NaN NaN
2013-08-31 40 -5
2013-09-30 NaN NaN
2013-10-31 NaN NaN
2013-11-30 NaN NaN
2013-12-31 NaN NaN
2014-01-31 25 15
2014-02-28 60 80
Run Code Online (Sandbox Code Playgroud)
注意:以前pd.Grouper(freq="M")写的是pd.TimeGrouper("M").后者现在已弃用0.21.
我原以为下面会有效,但是没有(因为as_index没有得到尊重?我不确定.)为了兴趣,我把它包括在内.
如果它是一个列(它必须是datetime64列!正如我所说,点击它to_datetime),你可以使用PeriodIndex:
In [21]: df
Out[21]:
Date abc xyz
0 2013-06-01 100 200
1 2013-06-03 -20 50
2 2013-08-15 40 -5
3 2014-01-20 25 15
4 2014-02-21 60 80
In [22]: pd.DatetimeIndex(df.Date).to_period("M") # old way
Out[22]:
<class 'pandas.tseries.period.PeriodIndex'>
[2013-06, ..., 2014-02]
Length: 5, Freq: M
In [23]: per = df.Date.dt.to_period("M") # new way to get the same
In [24]: g = df.groupby(per)
In [25]: g.sum() # dang not quite what we want (doesn't fill in the gaps)
Out[25]:
abc xyz
2013-06 80 250
2013-08 40 -5
2014-01 25 15
2014-02 60 80
Run Code Online (Sandbox Code Playgroud)
为了获得理想的结果,我们必须重新索引...
Q-m*_*man 40
为什么不保持简单?!
GB=DF.groupby([(DF.index.year),(DF.index.month)]).sum()
Run Code Online (Sandbox Code Playgroud)
给你,
print(GB)
abc xyz
2013 6 80 250
8 40 -5
2014 1 25 15
2 60 80
Run Code Online (Sandbox Code Playgroud)
然后你可以像要求使用,
GB.plot('abc','xyz',kind='scatter')
Run Code Online (Sandbox Code Playgroud)
小智 8
有不同的方法可以做到这一点.
Run Code Online (Sandbox Code Playgroud)df = pd.DataFrame({'Date':['01-Jun-13','03-Jun-13', '15-Aug-13', '20-Jan-14', '21-Feb-14'],'abc':[100,20,40,25,60],'xyz':[200,50,-5,15,80]})
Run Code Online (Sandbox Code Playgroud)def getMonth(s): return s.split("-")[1] def getDay(s): return s.split("-")[0] def getYear(s): return s.split("-")[2] def getYearMonth(s): return s.split("-")[1]+"-"+s.split("-")[2]
year,month,day和" yearMonth".在您的情况下,您需要两者之一.您可以使用两列'year','month'或使用一列进行分组yearMonthRun Code Online (Sandbox Code Playgroud)df['year']= df['Date'].apply(lambda x: getYear(x)) df['month']= df['Date'].apply(lambda x: getMonth(x)) df['day']= df['Date'].apply(lambda x: getDay(x)) df['YearMonth']= df['Date'].apply(lambda x: getYearMonth(x))
输出:
Date abc xyz year month day YearMonth
0 01-Jun-13 100 200 13 Jun 01 Jun-13
1 03-Jun-13 -20 50 13 Jun 03 Jun-13
2 15-Aug-13 40 -5 13 Aug 15 Aug-13
3 20-Jan-14 25 15 14 Jan 20 Jan-14
4 21-Feb-14 60 80 14 Feb 21 Feb-14
Run Code Online (Sandbox Code Playgroud)
在这种情况下,我们按两列分组:
Run Code Online (Sandbox Code Playgroud)for key,g in df.groupby(['year','month']): print key,g
输出:
('13', 'Jun') Date abc xyz year month day YearMonth
0 01-Jun-13 100 200 13 Jun 01 Jun-13
1 03-Jun-13 -20 50 13 Jun 03 Jun-13
('13', 'Aug') Date abc xyz year month day YearMonth
2 15-Aug-13 40 -5 13 Aug 15 Aug-13
('14', 'Jan') Date abc xyz year month day YearMonth
3 20-Jan-14 25 15 14 Jan 20 Jan-14
('14', 'Feb') Date abc xyz year month day YearMonth
Run Code Online (Sandbox Code Playgroud)
在这种情况下,我们按一列分组:
Run Code Online (Sandbox Code Playgroud)for key,g in df.groupby(['YearMonth']): print key,g
输出:
Jun-13 Date abc xyz year month day YearMonth
0 01-Jun-13 100 200 13 Jun 01 Jun-13
1 03-Jun-13 -20 50 13 Jun 03 Jun-13
Aug-13 Date abc xyz year month day YearMonth
2 15-Aug-13 40 -5 13 Aug 15 Aug-13
Jan-14 Date abc xyz year month day YearMonth
3 20-Jan-14 25 15 14 Jan 20 Jan-14
Feb-14 Date abc xyz year month day YearMonth
4 21-Feb-14 60 80 14 Feb 21 Feb-14
Run Code Online (Sandbox Code Playgroud)
get_groupprint df.groupby(['YearMonth']).get_group('Jun-13')
输出:
Date abc xyz year month day YearMonth
0 01-Jun-13 100 200 13 Jun 01 Jun-13
1 03-Jun-13 -20 50 13 Jun 03 Jun-13
Run Code Online (Sandbox Code Playgroud)
get_group.这个hack将有助于过滤值并获得分组值.这也会得到相同的结果.
print df[df['YearMonth']=='Jun-13']
Run Code Online (Sandbox Code Playgroud)
输出:
Date abc xyz year month day YearMonth
0 01-Jun-13 100 200 13 Jun 01 Jun-13
1 03-Jun-13 -20 50 13 Jun 03 Jun-13
Run Code Online (Sandbox Code Playgroud)
您可以选择列表abc或xyz值Jun-13
print df[df['YearMonth']=='Jun-13'].abc.values
print df[df['YearMonth']=='Jun-13'].xyz.values
Run Code Online (Sandbox Code Playgroud)
输出:
[100 -20] #abc values
[200 50] #xyz values
Run Code Online (Sandbox Code Playgroud)
您可以使用它来查看已分类为"年 - 月"的日期,并在其上应用cretiria以获取相关数据.
for x in set(df.YearMonth):
print df[df['YearMonth']==x].abc.values
print df[df['YearMonth']==x].xyz.values
Run Code Online (Sandbox Code Playgroud)
我也建议也检查一下这个答案.
您还可以通过创建包含年份和月份的字符串列来完成此操作,如下所示:
df['date'] = df.index
df['year-month'] = df['date'].apply(lambda x: str(x.year) + ' ' + str(x.month))
grouped = df.groupby('year-month')
Run Code Online (Sandbox Code Playgroud)
但是,当您循环遍历组时,这不会保留顺序,例如
for name, group in grouped:
print(name)
Run Code Online (Sandbox Code Playgroud)
会给:
df['date'] = df.index
df['year-month'] = df['date'].apply(lambda x: str(x.year) + ' ' + str(x.month))
grouped = df.groupby('year-month')
Run Code Online (Sandbox Code Playgroud)
那么,如果你想保留顺序,你必须按照上面@Q-man的建议进行操作:
grouped = df.groupby([df.index.year, df.index.month])
Run Code Online (Sandbox Code Playgroud)
这将保留上述循环中的顺序:
for name, group in grouped:
print(name)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
73448 次 |
| 最近记录: |