elz*_*rdo 6 sorting dataframe python-2.7 pandas categorical-data
我想通过使用选择特定列条目的Pandas数据框中的顶级条目df_selected = df_targets.head(N).
每个条目都有一个target值(按重要性顺序):
Likely Supporter, GOTV, Persuasion, Persuasion+GOTV
Run Code Online (Sandbox Code Playgroud)
不幸的是如果我这样做
df_targets = df_targets.sort("target")
Run Code Online (Sandbox Code Playgroud)
订货会是字母(GOTV,Likely Supporter,...).
我希望有一个关键字,如list_ordering:
my_list = ["Likely Supporter", "GOTV", "Persuasion", "Persuasion+GOTV"]
df_targets = df_targets.sort("target", list_ordering=my_list)
Run Code Online (Sandbox Code Playgroud)
为了解决这个问题,我创建了一个字典:
dict_targets = OrderedDict()
dict_targets["Likely Supporter"] = "0 Likely Supporter"
dict_targets["GOTV"] = "1 GOTV"
dict_targets["Persuasion"] = "2 Persuasion"
dict_targets["Persuasion+GOTV"] = "3 Persuasion+GOTV"
Run Code Online (Sandbox Code Playgroud)
,但它似乎是一种非pythonic方法.
建议将不胜感激!
jez*_*ael 12
我认为你需要Categorical参数ordered=True,然后按sort_values工作排序非常好:
import pandas as pd
df = pd.DataFrame({'a': ['GOTV', 'Persuasion', 'Likely Supporter',
'GOTV', 'Persuasion', 'Persuasion+GOTV']})
df.a = pd.Categorical(df.a,
categories=["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"],
ordered=True)
print (df)
a
0 GOTV
1 Persuasion
2 Likely Supporter
3 GOTV
4 Persuasion
5 Persuasion+GOTV
print (df.a)
0 GOTV
1 Persuasion
2 Likely Supporter
3 GOTV
4 Persuasion
5 Persuasion+GOTV
Name: a, dtype: category
Categories (4, object): [Likely Supporter < GOTV < Persuasion < Persuasion+GOTV]
Run Code Online (Sandbox Code Playgroud)
df.sort_values('a', inplace=True)
print (df)
a
2 Likely Supporter
0 GOTV
3 GOTV
1 Persuasion
4 Persuasion
5 Persuasion+GOTV
Run Code Online (Sandbox Code Playgroud)