Pandas - 更改因子类型对象的级别顺序

Squ*_*627 11 python pandas

我有一个dfschool列为因子的Pandas数据帧

Name    school
A       An
B       Bn
C       Bn
Run Code Online (Sandbox Code Playgroud)

如何school在python 中将列的级别从('An','Bn')更改为('Bn','An')?

R等价物

levels(df$school) = c('Bn','An')
Run Code Online (Sandbox Code Playgroud)

And*_*den 12

您可以使用reorder_categories(传入已排序的因子):

In [11]: df
Out[11]:
  Name school
0    A     An
1    B     Bn
2    C     Bn

In [12]: df['school'] = df['school'].astype('category')

In [13]: df['school']
Out[13]:
0    An
1    Bn
2    Bn
Name: school, dtype: category
Categories (2, object): [An, Bn]

In [14]: df['school'].cat.reorder_categories(['Bn', 'An'])
Out[14]:
0    An
1    Bn
2    Bn
dtype: category
Categories (2, object): [Bn, An]
Run Code Online (Sandbox Code Playgroud)

您可以在现场执行此操作:

In [21]: df['school'].cat.reorder_categories(['Bn', 'An'], inplace=True)

In [22]: df['school']
Out[22]:
0    An
1    Bn
2    Bn
Name: school, dtype: category
Categories (2, object): [Bn, An]
Run Code Online (Sandbox Code Playgroud)

请参阅文档的重新排序类别部分.


HYR*_*YRY 5

您可以设置cat.categories

import pandas as pd

school = pd.Series(["An", "Bn", "Bn"])
school = school.astype("category")

school.cat.categories = ["Bn", "An"]
Run Code Online (Sandbox Code Playgroud)

  • 我想这不是OP想要的。`pandas.Series.cat.categories` 使用输入列表替换数据集中的值本身(因此级别“An”现在将是“Bn”,反之亦然),而问题是仅更改顺序级别。 (2认同)

Ale*_*der 1

作为一般解决方案,您可以使用字典重新映射:

df = pd.DataFrame({'Name': ['A', 'B', 'C'], 
                   'school': ['An', 'Bn', 'Bn']})
d = {'An': 'Bn', 'Bn': 'An'}
df['school'] = df.school.map(d)
>>> df
  Name school
0    A     Bn
1    B     An
2    C     An
Run Code Online (Sandbox Code Playgroud)