Ib *_*b D 5 python counter dataframe pandas categorical-data
在我的数据框中,我有一些带有100多个不同类别的分类列。我想按最频繁的类别进行排名。我保留前9个最频繁的类别,而较不频繁的类别则通过以下方式自动将其重命名:OTHER
例:
这是我的df:
print(df)
Employee_number Jobrol
0 1 Sales Executive
1 2 Research Scientist
2 3 Laboratory Technician
3 4 Sales Executive
4 5 Research Scientist
5 6 Laboratory Technician
6 7 Sales Executive
7 8 Research Scientist
8 9 Laboratory Technician
9 10 Sales Executive
10 11 Research Scientist
11 12 Laboratory Technician
12 13 Sales Executive
13 14 Research Scientist
14 15 Laboratory Technician
15 16 Sales Executive
16 17 Research Scientist
17 18 Research Scientist
18 19 Manager
19 20 Human Resources
20 21 Sales Executive
valCount = df['Jobrol'].value_counts()
valCount
Sales Executive 7
Research Scientist 7
Laboratory Technician 5
Manager 1
Human Resources 1
Run Code Online (Sandbox Code Playgroud)
我保留前3个类别,然后用“ OTHER”重命名其余类别,该如何进行?
谢谢。
将您的系列转换为分类,提取计数不在前 3 中的类别,添加一个新类别 eg 'Other',然后替换之前计算的类别:
df['Jobrol'] = df['Jobrol'].astype('category')
others = df['Jobrol'].value_counts().index[3:]
label = 'Other'
df['Jobrol'] = df['Jobrol'].cat.add_categories([label])
df['Jobrol'] = df['Jobrol'].replace(others, label)
Run Code Online (Sandbox Code Playgroud)
注:它是诱人的通过重新命名他们的类别合并df['Jobrol'].cat.rename_categories(dict.fromkeys(others, label)),但是这是行不通的,因为这将意味着多个标记相同的类别,这是不可能的。
上述解决方案可适用于按计数过滤。例如,要仅包含计数为 1 的类别,您可以这样定义others:
counts = df['Jobrol'].value_counts()
others = counts[counts == 1].index
Run Code Online (Sandbox Code Playgroud)
need = df['Jobrol'].value_counts().index[:3]
df['Jobrol'] = np.where(df['Jobrol'].isin(need), df['Jobrol'], 'OTHER')
valCount = df['Jobrol'].value_counts()
print (valCount)
Research Scientist 7
Sales Executive 7
Laboratory Technician 5
OTHER 2
Name: Jobrol, dtype: int64
Run Code Online (Sandbox Code Playgroud)
另一种解决方案:
N = 3
s = df['Jobrol'].value_counts()
valCount = s.iloc[:N].append(pd.Series(s.iloc[N:].sum(), index=['OTHER']))
print (valCount)
Research Scientist 7
Sales Executive 7
Laboratory Technician 5
OTHER 2
dtype: int64
Run Code Online (Sandbox Code Playgroud)