Pandas DataFrame按分类列排序,但按特定的类排序排序

elz*_*rdo 6 sorting dataframe python-2.7 pandas categorical-data

我想通过使用选择特定列条目的Pandas数据框中的顶级条目df_selected = df_targets.head(N).

每个条目都有一个target值(按重要性顺序):

Likely Supporter, GOTV, Persuasion, Persuasion+GOTV  
Run Code Online (Sandbox Code Playgroud)

不幸的是如果我这样做

df_targets = df_targets.sort("target")
Run Code Online (Sandbox Code Playgroud)

订货会是字母(GOTV,Likely Supporter,...).

我希望有一个关键字,如list_ordering:

my_list = ["Likely Supporter", "GOTV", "Persuasion", "Persuasion+GOTV"] 
df_targets = df_targets.sort("target", list_ordering=my_list)
Run Code Online (Sandbox Code Playgroud)

为了解决这个问题,我创建了一个字典:

dict_targets = OrderedDict()
dict_targets["Likely Supporter"] = "0 Likely Supporter"
dict_targets["GOTV"] = "1 GOTV"
dict_targets["Persuasion"] = "2 Persuasion"
dict_targets["Persuasion+GOTV"] = "3 Persuasion+GOTV"
Run Code Online (Sandbox Code Playgroud)

,但它似乎是一种非pythonic方法.

建议将不胜感激!

jez*_*ael 12

我认为你需要Categorical参数ordered=True,然后按sort_values工作排序非常好:

import pandas as pd

df = pd.DataFrame({'a': ['GOTV', 'Persuasion', 'Likely Supporter', 
                         'GOTV', 'Persuasion', 'Persuasion+GOTV']})

df.a = pd.Categorical(df.a, 
                      categories=["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"],
                      ordered=True)

print (df)
                  a
0              GOTV
1        Persuasion
2  Likely Supporter
3              GOTV
4        Persuasion
5   Persuasion+GOTV

print (df.a)
0                GOTV
1          Persuasion
2    Likely Supporter
3                GOTV
4          Persuasion
5     Persuasion+GOTV
Name: a, dtype: category
Categories (4, object): [Likely Supporter < GOTV < Persuasion < Persuasion+GOTV]
Run Code Online (Sandbox Code Playgroud)
df.sort_values('a', inplace=True)
print (df)
                  a
2  Likely Supporter
0              GOTV
3              GOTV
1        Persuasion
4        Persuasion
5   Persuasion+GOTV
Run Code Online (Sandbox Code Playgroud)