groupby和均值后的“没有要聚合的数字类型”

Har*_*rry 3 python pandas

我正在处理时间序列,并尝试编写函数以计算数据的每月平均值。以下是一些准备功能:

import datetime
import numpy as numpy
def date_range_0(start,end):

    dates = [start + datetime.timedelta(days=i) 
            for i in range((end-start).days+1)]
    return numpy.array(dates)
def date_range_1(start,days):
    #days should be an interger

    return date_range_0(start,start+datetime.timedelta(days-1))

x=date_range_1(datetime.datetime(2015, 5, 17),4)
Run Code Online (Sandbox Code Playgroud)

x,输出是一个简单的时间列表:

array([datetime.datetime(2015, 5, 17, 0, 0),
   datetime.datetime(2015, 5, 18, 0, 0),
   datetime.datetime(2015, 5, 19, 0, 0),
   datetime.datetime(2015, 5, 20, 0, 0)], dtype=object)
Run Code Online (Sandbox Code Playgroud)

然后,我从http://blog.csdn.net/youngbit007/article/details/54288603学习了groupby函数, 我已经在上面的网站中尝试了一个示例,并且工作正常:

df = pandas.DataFrame({'key1':date_range_1(datetime.datetime(2015, 1, 17),5),
              'key2': [2015001,2015001,2015001,2015001,2015001],
              'data1': 1+0.1*numpy.arange(1,6)
        })
df
Run Code Online (Sandbox Code Playgroud)

   data1    key1    key2
0   1.1 2015-01-17  2015001
1   1.2 2015-01-18  2015001
2   1.3 2015-01-19  2015001
3   1.4 2015-01-20  2015001
4   1.5 2015-01-21  2015001
Run Code Online (Sandbox Code Playgroud)

grouped=df['data1'].groupby(df['key2'])
grouped.mean()
Run Code Online (Sandbox Code Playgroud)

key2
2015001    0.2
Name: data1, dtype: float64
Run Code Online (Sandbox Code Playgroud)

然后,我尝试自己的示例:

datedat=numpy.array([date_range_1(datetime.datetime(2015, 1, 17),5),1+0.1*numpy.arange(1,6)]).T
months = [day.month for day in datedat[:,0]]
years = [day.year for day in datedat[:,0]]
datedatF = 
pandas.DataFrame({'key1':datedat[:,0],'key2':list((numpy.array(years)*1000 +numpy.array(months))),'data1':datedat[:,1]})
datedatF
Run Code Online (Sandbox Code Playgroud)

产生了

   data1    key1    key2
0   1.1 2015-01-17  2015001
1   1.2 2015-01-18  2015001
2   1.3 2015-01-19  2015001
3   1.4 2015-01-20  2015001
4   1.5 2015-01-21  2015001
Run Code Online (Sandbox Code Playgroud)

请注意,这与上面的表格完全相同!到目前为止,一切都很好。然后我运行:

grouped2=datedatF['data1'].groupby(datedatF['key2'])
grouped2.mean()
Run Code Online (Sandbox Code Playgroud)

它抛出了这个:

   ---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-170-f0d2bc225b88> in <module>()
  1 grouped2=datedatF['data1'].groupby(datedatF['key2'])
----> 2 grouped2.mean()

/root/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py in     mean(self, *args, **kwargs)
   1017         nv.validate_groupby_func('mean', args, kwargs)
   1018         try:
-> 1019             return self._cython_agg_general('mean')
   1020         except GroupByError:
   1021             raise

/root/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py in     _cython_agg_general(self, how, numeric_only)
    806 
    807         if len(output) == 0:
--> 808             raise DataError('No numeric types to aggregate')
    809 
    810         return self._wrap_aggregated_output(output, names)

DataError: No numeric types to aggregate
Run Code Online (Sandbox Code Playgroud)

哦..我怎么了?为什么我不能指第二个pandas.DataFrame?与成功的例子完全相同!

WeN*_*Ben 8

您在df中输入data1是object,我们需要添加 pd.to_numeric

datedatF.dtypes
Out[39]: 
data1            object
key1     datetime64[ns]
key2              int64
dtype: object
grouped2=pd.to_numeric(datedatF['data1']).groupby(datedatF['key2'])
grouped2.mean()
Out[41]: 
key2
2015001    1.3
Name: data1, dtype: float64
Run Code Online (Sandbox Code Playgroud)


Max*_*axU 5

data1object(字符串)dtype:

In [396]: datedatF.dtypes
Out[396]:
data1            object   # <--- NOTE!
key1     datetime64[ns]
key2              int64
dtype: object
Run Code Online (Sandbox Code Playgroud)

所以试试这个:

In [397]: datedatF.assign(data1=pd.to_numeric(datedatF['data1'], errors='coerce')) \
                  .groupby('key2')['data1'].mean()
Out[397]:
key2
2015001    1.3
Name: data1, dtype: float64
Run Code Online (Sandbox Code Playgroud)