我希望适用pading于我的数据框的每一组
请注意,对于单个组('element_id'),我在填充方面没有问题:
第一组(group1):
{'date': {88: datetime.date(2017, 10, 3), 43: datetime.date(2017, 9, 26), 159: datetime.date(2017, 11, 8)}, u'element_id': {88: 122, 43: 122, 159: 122}, u'VALUE': {88: '8.0', 43: '2.0', 159: '5.0'}}
Run Code Online (Sandbox Code Playgroud)
所以我在它上面应用填充(效果很好):
print group1.set_index('date').asfreq('D', method='pad').head()
Run Code Online (Sandbox Code Playgroud)
我希望通过几个组应用这个逻辑 groupby
另一组(group2):
{'date': {88: datetime.date(2017, 10, 3), 43: datetime.date(2017, 9, 26), 159: datetime.date(2017, 11, 8)}, u'element_id': {88: 122, 43: 122, 159: 122}, u'VALUE': {88: '8.0', 43: '2.0', 159: '5.0'}}
group_data=pd.concat([group1,group2],axis=0)
group_data.groupby(['element_id']).set_index('date').resample('D').asfreq()
Run Code Online (Sandbox Code Playgroud)
我收到以下错误:
AttributeError: Cannot access callable attribute 'set_index' of 'DataFrameGroupBy' objects, try using the 'apply' method
Run Code Online (Sandbox Code Playgroud)
首先,您的date列有问题dtype,而不是日期时间,因此首先需要将其转换为to_datetime.
然后是可能的用途GroupBy.apply:
group_data['date'] = pd.to_datetime(group_data['date'])
df = (group_data.groupby(['element_id'])
.apply(lambda x: x.set_index('date').resample('D').ffill()))
print (df.head())
VALUE element_id
element_id date
122 2017-09-26 2.0 122
2017-09-27 2.0 122
2017-09-28 2.0 122
2017-09-29 2.0 122
2017-09-30 2.0 122
Run Code Online (Sandbox Code Playgroud)
df = group_data.set_index('date').groupby(['element_id']).resample('D').ffill()
print (df.head())
VALUE element_id
element_id date
122 2017-09-26 2.0 122
2017-09-27 2.0 122
2017-09-28 2.0 122
2017-09-29 2.0 122
2017-09-30 2.0 122
Run Code Online (Sandbox Code Playgroud)
编辑:
如果重复值问题的解决方案是为具有 unique 的子组添加新列dates。如果使用它concat有参数keys:
group1 = pd.DataFrame({'date': {88: datetime.date(2017, 10, 3),
43: datetime.date(2017, 9, 26),
159: datetime.date(2017, 11, 8)},
u'element_id': {88: 122, 43: 122, 159: 122},
u'VALUE': {88: '8.0', 43: '2.0', 159: '5.0'}})
d = {'level_0':'g'}
group_data=pd.concat([group1,group1], keys=('a','b')).reset_index(level=0).rename(columns=d)
print (group_data)
g VALUE date element_id
43 a 2.0 2017-09-26 122
88 a 8.0 2017-10-03 122
159 a 5.0 2017-11-08 122
43 b 2.0 2017-09-26 122
88 b 8.0 2017-10-03 122
159 b 5.0 2017-11-08 122
group_data['date'] = pd.to_datetime(group_data['date'])
df = (group_data.groupby(['g','element_id'])
.apply(lambda x: x.set_index('date').resample('D').ffill()))
print (df.head())
g VALUE element_id
g element_id date
a 122 2017-09-26 a 2.0 122
2017-09-27 a 2.0 122
2017-09-28 a 2.0 122
2017-09-29 a 2.0 122
2017-09-30 a 2.0 122
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3802 次 |
| 最近记录: |