我正在处理时间序列,并尝试编写函数以计算数据的每月平均值。以下是一些准备功能:
import datetime
import numpy as numpy
def date_range_0(start,end):
dates = [start + datetime.timedelta(days=i)
for i in range((end-start).days+1)]
return numpy.array(dates)
def date_range_1(start,days):
#days should be an interger
return date_range_0(start,start+datetime.timedelta(days-1))
x=date_range_1(datetime.datetime(2015, 5, 17),4)
Run Code Online (Sandbox Code Playgroud)
x,输出是一个简单的时间列表:
array([datetime.datetime(2015, 5, 17, 0, 0),
datetime.datetime(2015, 5, 18, 0, 0),
datetime.datetime(2015, 5, 19, 0, 0),
datetime.datetime(2015, 5, 20, 0, 0)], dtype=object)
Run Code Online (Sandbox Code Playgroud)
然后,我从http://blog.csdn.net/youngbit007/article/details/54288603学习了groupby函数, 我已经在上面的网站中尝试了一个示例,并且工作正常:
df = pandas.DataFrame({'key1':date_range_1(datetime.datetime(2015, 1, 17),5),
'key2': [2015001,2015001,2015001,2015001,2015001],
'data1': 1+0.1*numpy.arange(1,6)
})
df
Run Code Online (Sandbox Code Playgroud)
给
data1 key1 key2
0 1.1 2015-01-17 2015001
1 1.2 2015-01-18 2015001
2 1.3 2015-01-19 2015001
3 1.4 2015-01-20 2015001
4 1.5 2015-01-21 2015001
Run Code Online (Sandbox Code Playgroud)
和
grouped=df['data1'].groupby(df['key2'])
grouped.mean()
Run Code Online (Sandbox Code Playgroud)
给
key2
2015001 0.2
Name: data1, dtype: float64
Run Code Online (Sandbox Code Playgroud)
然后,我尝试自己的示例:
datedat=numpy.array([date_range_1(datetime.datetime(2015, 1, 17),5),1+0.1*numpy.arange(1,6)]).T
months = [day.month for day in datedat[:,0]]
years = [day.year for day in datedat[:,0]]
datedatF =
pandas.DataFrame({'key1':datedat[:,0],'key2':list((numpy.array(years)*1000 +numpy.array(months))),'data1':datedat[:,1]})
datedatF
Run Code Online (Sandbox Code Playgroud)
产生了
data1 key1 key2
0 1.1 2015-01-17 2015001
1 1.2 2015-01-18 2015001
2 1.3 2015-01-19 2015001
3 1.4 2015-01-20 2015001
4 1.5 2015-01-21 2015001
Run Code Online (Sandbox Code Playgroud)
请注意,这与上面的表格完全相同!到目前为止,一切都很好。然后我运行:
grouped2=datedatF['data1'].groupby(datedatF['key2'])
grouped2.mean()
Run Code Online (Sandbox Code Playgroud)
它抛出了这个:
---------------------------------------------------------------------------
DataError Traceback (most recent call last)
<ipython-input-170-f0d2bc225b88> in <module>()
1 grouped2=datedatF['data1'].groupby(datedatF['key2'])
----> 2 grouped2.mean()
/root/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py in mean(self, *args, **kwargs)
1017 nv.validate_groupby_func('mean', args, kwargs)
1018 try:
-> 1019 return self._cython_agg_general('mean')
1020 except GroupByError:
1021 raise
/root/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py in _cython_agg_general(self, how, numeric_only)
806
807 if len(output) == 0:
--> 808 raise DataError('No numeric types to aggregate')
809
810 return self._wrap_aggregated_output(output, names)
DataError: No numeric types to aggregate
Run Code Online (Sandbox Code Playgroud)
哦..我怎么了?为什么我不能指第二个pandas.DataFrame?与成功的例子完全相同!
您在df中输入data1是object,我们需要添加 pd.to_numeric
datedatF.dtypes
Out[39]:
data1 object
key1 datetime64[ns]
key2 int64
dtype: object
grouped2=pd.to_numeric(datedatF['data1']).groupby(datedatF['key2'])
grouped2.mean()
Out[41]:
key2
2015001 1.3
Name: data1, dtype: float64
Run Code Online (Sandbox Code Playgroud)
你data1是object(字符串)dtype:
In [396]: datedatF.dtypes
Out[396]:
data1 object # <--- NOTE!
key1 datetime64[ns]
key2 int64
dtype: object
Run Code Online (Sandbox Code Playgroud)
所以试试这个:
In [397]: datedatF.assign(data1=pd.to_numeric(datedatF['data1'], errors='coerce')) \
.groupby('key2')['data1'].mean()
Out[397]:
key2
2015001 1.3
Name: data1, dtype: float64
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4451 次 |
| 最近记录: |