我在重塑数据框数据时遇到错误。
KeyError: 'Requested level (date) does not match index name (None)'
Run Code Online (Sandbox Code Playgroud)
更多详情如下:
# dataframe
# print(df.head(3))
...
account_id entity ae is_pc is_new_customer agency related_entity type medium our_side_entity settlement_title settlement_short_title settlement_type system_value account_status date sale
12323 entity1 ae1 PC yes MB EC TWITTER our_side_entity1 settlement_title settlement_short_title 1 0.2 active 2020-07-01 jimmy
12323 entity1 ae1 PC yes MB EC GOOGLE our_side_entity2 settlement_title settlement_short_title 1 0.5 active 2020-07-02 jimmy
1037093 Bentity1 ae1 PC yes MB APP Google our_side_entity3 settlement_title settlement_short_title 2 0 disable 2020-07-03 jimmy
1037093 Bentity1 ae1 PC yes MB APP Google our_side_entity3 settlement_title settlement_short_title 2 2020-07-04 jimmy
1037093 Bentity1 ae1 PC yes MB APP Google our_side_entity3 settlement_title settlement_short_title 2 2020-07-05 jimmy
...
Run Code Online (Sandbox Code Playgroud)
然后我想要对account, date帐户的总 system_value 进行分组和求和。我尝试使用以下代码但失败了:
indices = OrderedDict([
('account_id', 'ID'),
('entity', 'entity'),
('ae', 'AE'),
('is_pc', 'PC'),
('is_new_customer', 'new_customer'),
('agency', 'agency'),
('related_entity', 'related_entity'),
('type', 'type'),
('medium', 'medium'),
('our_side_entity', 'our_side_entity'),
('settlement_title', 'settlement_title'),
('settlement_short_title', 'settlement_short_title'),
('settlement_type', 'settlement_type'),
('account_status', 'account_status'),
('sale', 'sale'),
('date', 'date'),
])
df = df.groupby(list(indices.keys())).system_value.sum() \
.unstack('date', fill_value=None) \
.assign(total=lambda x: x.sum(1)) \
.reset_index()
print(df)
df = df.rename(columns=indices). \
set_index(indices['account_id'])
Run Code Online (Sandbox Code Playgroud)
错误如下:
KeyError: 'Requested level (date) does not match index name (None)'
Run Code Online (Sandbox Code Playgroud)
您能告诉我我的试验有什么问题吗?
谢谢。
更新我的试用的更多详细信息
下面的代码可以一直重现该错误
import pandas as pd
from collections import OrderedDict
s = [
{'account_id': '123123213',
'entity': 'entity2',
'ae': 'ae1',
'is_pc': 'PC',
'is_new_customer': 'yes',
'agency': 'BV',
'related_entity': None,
'type': 'EC',
'medium': 'Facebook',
'our_side_entity': 'our_side_entity',
'settlement_title': 'settlement_title',
'settlement_short_title': 'SS',
'settlement_type': 'unknown',
'system_value': None,
'account_status': None,
'date': '2020-07-22',
'sale': 'sale1'},
]
indices = OrderedDict([
('account_id', 'ID'),
('entity', 'Entity'),
('ae', 'AE'),
('is_pc', 'PC'),
('is_new_customer', 'NEW_CUSTOMER'),
('agency', 'agency'),
('related_entity', 'related_entity'),
('type', 'type'),
('medium', 'medium'),
('our_side_entity', 'our_side_entity'),
('settlement_title', 'settlement_title'),
('settlement_short_title', 'settlement_short_title'),
('settlement_type', 'settlement_type'),
('sale', 'sale'),
('date', 'date'),
])
df = pd.DataFrame.from_records(s)
# print df.to_dict()
{'account_id': {0: '123123213'}, 'entity': {0: 'entity2'}, 'ae': {0: 'ae1'}, 'is_pc': {0: 'PC'}, 'is_new_customer': {0: 'yes'}, 'agency': {0: 'BV'}, 'related_entity': {0: None}, 'type': {0: 'EC'}, 'medium': {0: 'Facebook'}, 'our_side_entity': {0: 'our_side_entity'}, 'settlement_title': {0: 'settlement_title'}, 'settlement_short_title': {0: 'SS'}, 'settlement_type': {0: 'unknown'}, 'system_value': {0: None}, 'account_status': {0: None}, 'date': {0: '2020-07-22'}, 'sale': {0: 'sale1'}}
df = df.groupby(list(indices.keys())).system_value.sum() \
.unstack('date', fill_value=None) \
.assign(total=lambda x: x.sum(1)) \
.reset_index()
indices["account_status"] = "status"
df = df.rename(columns=indices). \
set_index(indices['account_id'])
print(df)
Run Code Online (Sandbox Code Playgroud)
您正在按所有无值的列进行分组。在您的示例中, 的值为related_entityNone,这会导致空数据框:
In [7]: df.groupby(list(indices.keys())).sum()
Out[7]:
Empty DataFrame
Columns: []
Index: []
Run Code Online (Sandbox Code Playgroud)
我建议您从 groupby 子句中删除此列
[编辑]:要将 的值设置related_entity为 的值entity,您可以简单地执行以下操作:
df['related_entity'] = df['entity']
Run Code Online (Sandbox Code Playgroud)
或者假设您不想替换其中的某些值:
df['related_entity'] = df['related_entity'].fillna(df['entity'])
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
19821 次 |
| 最近记录: |