如何只计算pandas数据帧中的特定值

Question

如何只计算pandas数据帧中的特定值

我有以下pandas数据帧;

a = [['01', '12345', 'null'], ['02', '78910', '9870'], ['01', '23456', 'null'],['01', '98765', '8760']]

df_a = pd.DataFrame(a, columns=['id', 'order', 'location'])

Run Code Online (Sandbox Code Playgroud)

我需要计算每个ID发生的NULL值(NULL是一个字符串)的数量.结果看起来像;

id   null_count
01    02

Run Code Online (Sandbox Code Playgroud)

我可以使用groupby获得基本计数:

new_df = df_a.groupby(['id', 'location'])['id'].count()

Run Code Online (Sandbox Code Playgroud)

但结果返回的不仅仅是NULL值;

id  location
01  8760        1
    null        2
02  9870        1

Run Code Online (Sandbox Code Playgroud)

Answer 1

Sco*_*ton 6

因为在源数据帧中,您的NULL是字符串'null',请使用:

df_a.groupby('id')['location'].apply(lambda x: (x=='null').sum())\
    .reset_index(name='null_count')

Run Code Online (Sandbox Code Playgroud)

输出:

   id  null_count
0  01          2
1  02          0

Run Code Online (Sandbox Code Playgroud)

要么

df_a.query('location == "null"').groupby('id')['location'].size()\
    .reset_index(name='null_count')

Run Code Online (Sandbox Code Playgroud)

输出:

   id  null_count
0  01           2

Run Code Online (Sandbox Code Playgroud)

Answer 2

WeN*_*Ben 5

根据您自己的代码,添加.loc通知这是多索引切片..

df_a.groupby(['id', 'location'])['id'].count().loc[:,'null']
Out[932]: 
id
01    2
Name: id, dtype: int64

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，2 月前
查看次数：	76 次
最近记录：	8 年，2 月前