KeyError:“请求的级别(日期)与索引名称不匹配(无)”

jia*_*mmy 6 python pandas

我在重塑数据框数据时遇到错误。

KeyError: 'Requested level (date) does not match index name (None)'
Run Code Online (Sandbox Code Playgroud)

更多详情如下:

# dataframe

# print(df.head(3))
       
    
...
account_id    entity     ae     is_pc   is_new_customer agency  related_entity  type    medium   our_side_entity      settlement_title  settlement_short_title  settlement_type  system_value   account_status  date    sale
12323         entity1   ae1     PC        yes            MB                     EC     TWITTER   our_side_entity1    settlement_title   settlement_short_title      1                0.2          active    2020-07-01     jimmy 
12323         entity1   ae1     PC        yes            MB                     EC     GOOGLE    our_side_entity2    settlement_title   settlement_short_title      1                0.5          active    2020-07-02    jimmy
1037093       Bentity1  ae1     PC        yes            MB                     APP    Google    our_side_entity3    settlement_title   settlement_short_title      2                0            disable   2020-07-03     jimmy
1037093       Bentity1  ae1     PC        yes            MB                     APP    Google    our_side_entity3    settlement_title   settlement_short_title      2                                      2020-07-04     jimmy
1037093       Bentity1  ae1     PC        yes            MB                     APP    Google    our_side_entity3    settlement_title   settlement_short_title      2                                      2020-07-05      jimmy
...  

Run Code Online (Sandbox Code Playgroud)

然后我想要对account, date帐户的总 system_value 进行分组和求和。我尝试使用以下代码但失败了:


            indices = OrderedDict([
                ('account_id', 'ID'),
                ('entity', 'entity'),
                ('ae', 'AE'),
                ('is_pc', 'PC'),
                ('is_new_customer', 'new_customer'),
                ('agency', 'agency'),
                ('related_entity', 'related_entity'),
                ('type', 'type'),
                ('medium', 'medium'),
                ('our_side_entity', 'our_side_entity'),
                ('settlement_title', 'settlement_title'),
                ('settlement_short_title', 'settlement_short_title'),
                ('settlement_type', 'settlement_type'),
                ('account_status', 'account_status'),
                ('sale', 'sale'),
                ('date', 'date'),

            ])


            df = df.groupby(list(indices.keys())).system_value.sum() \
                .unstack('date', fill_value=None) \
                .assign(total=lambda x: x.sum(1)) \
                .reset_index()
            print(df)
            df = df.rename(columns=indices). \
                set_index(indices['account_id'])

Run Code Online (Sandbox Code Playgroud)

错误如下:

KeyError: 'Requested level (date) does not match index name (None)'

Run Code Online (Sandbox Code Playgroud)

您能告诉我我的试验有什么问题吗?

谢谢。

更新我的试用的更多详细信息

下面的代码可以一直重现该错误

import pandas as pd
from collections import OrderedDict


s = [
    {'account_id': '123123213',
     'entity': 'entity2',
     'ae': 'ae1',
     'is_pc': 'PC',
     'is_new_customer': 'yes',
     'agency': 'BV',
     'related_entity': None,
     'type': 'EC',
     'medium': 'Facebook',
     'our_side_entity': 'our_side_entity',
     'settlement_title': 'settlement_title',
     'settlement_short_title': 'SS',
     'settlement_type': 'unknown',
     'system_value': None,
     'account_status': None,
     'date': '2020-07-22',
     'sale': 'sale1'},
]


indices = OrderedDict([
    ('account_id', 'ID'),
    ('entity', 'Entity'),
    ('ae', 'AE'),
    ('is_pc', 'PC'),
    ('is_new_customer', 'NEW_CUSTOMER'),
    ('agency', 'agency'),
    ('related_entity', 'related_entity'),
    ('type', 'type'),
    ('medium', 'medium'),
    ('our_side_entity', 'our_side_entity'),
    ('settlement_title', 'settlement_title'),
    ('settlement_short_title', 'settlement_short_title'),
    ('settlement_type', 'settlement_type'),
    ('sale', 'sale'),
    ('date', 'date'),
])

df = pd.DataFrame.from_records(s)
# print df.to_dict()

{'account_id': {0: '123123213'}, 'entity': {0: 'entity2'}, 'ae': {0: 'ae1'}, 'is_pc': {0: 'PC'}, 'is_new_customer': {0: 'yes'}, 'agency': {0: 'BV'}, 'related_entity': {0: None}, 'type': {0: 'EC'}, 'medium': {0: 'Facebook'}, 'our_side_entity': {0: 'our_side_entity'}, 'settlement_title': {0: 'settlement_title'}, 'settlement_short_title': {0: 'SS'}, 'settlement_type': {0: 'unknown'}, 'system_value': {0: None}, 'account_status': {0: None}, 'date': {0: '2020-07-22'}, 'sale': {0: 'sale1'}}

df = df.groupby(list(indices.keys())).system_value.sum() \
    .unstack('date', fill_value=None) \
    .assign(total=lambda x: x.sum(1)) \
    .reset_index()
indices["account_status"] = "status"
df = df.rename(columns=indices). \
    set_index(indices['account_id'])
print(df)



Run Code Online (Sandbox Code Playgroud)

qme*_*eus 5

您正在按所有无值的列进行分组。在您的示例中, 的值为related_entityNone,这会导致空数据框:

In [7]: df.groupby(list(indices.keys())).sum()                                                                                                                                                                       
Out[7]: 
Empty DataFrame
Columns: []
Index: []
Run Code Online (Sandbox Code Playgroud)

我建议您从 groupby 子句中删除此列

[编辑]:要将 的值设置related_entity为 的值entity,您可以简单地执行以下操作:

df['related_entity'] = df['entity']
Run Code Online (Sandbox Code Playgroud)

或者假设您不想替换其中的某些值:

df['related_entity'] = df['related_entity'].fillna(df['entity'])
Run Code Online (Sandbox Code Playgroud)