如何在JSON字段上对Pandas DataFrame进行排序

Question

如何在JSON字段上对Pandas DataFrame进行排序

我在熊猫数据框中有这样的数据

   id     import_id              investor_id     loan_id      meta
   35736  unremit_loss_100312         Q05         0051765139  {u'total_paid': u'75', u'total_expense': u'75'}
   35737  unremit_loss_100313         Q06         0051765140  {u'total_paid': u'77', u'total_expense': u'78'}
   35739  unremit_loss_100314         Q06         0051765141  {u'total_paid': u'80', u'total_expense': u'65'}

Run Code Online (Sandbox Code Playgroud)

如何基于total_expense进行排序，后者是json字段的值，
例如：meta字段上的total_expense

输出应为

id     import_id              investor_id     loan_id      meta
35739  unremit_loss_100314         Q06         0051765141  {u'total_paid': u'80', u'total_expense': u'65'}
35736  unremit_loss_100312         Q05         0051765139  {u'total_paid': u'75', u'total_expense': u'75'}
35737  unremit_loss_100313         Q06         0051765140  {u'total_paid': u'77', u'total_expense': u'78'}

Run Code Online (Sandbox Code Playgroud)

Answer 1

jez*_*ael 2

使用：

print (df)
      id            import_id investor_id   loan_id  \
0  35736  unremit_loss_100312         Q05  51765139   
1  35736  unremit_loss_100312         Q05  51765139   
2  35736  unremit_loss_100312         Q05  51765139   

                                               meta  
0   {u'total_paid': u'75', u'total_expense': u'75'}  
1   {u'total_paid': u'75', u'total_expense': u'20'}  
2  {u'total_paid': u'75', u'total_expense': u'100'}  

import ast

df['meta'] = df['meta'].apply(ast.literal_eval)

df = df.iloc[df['meta'].str['total_expense'].astype(int).argsort()]

print (df)
      id            import_id investor_id   loan_id  \
1  35736  unremit_loss_100312         Q05  51765139   
0  35736  unremit_loss_100312         Q05  51765139   
2  35736  unremit_loss_100312         Q05  51765139   

                                           meta  
1   {'total_paid': '75', 'total_expense': '20'}  
0   {'total_paid': '75', 'total_expense': '75'}  
2  {'total_paid': '75', 'total_expense': '100'}

Run Code Online (Sandbox Code Playgroud)

如果可能的话，如果total_expense某些行缺少键，则将缺失值转换为像所有其他值一样较低的整数，例如-1这些行的第一个位置：

print (df)
      id            import_id investor_id   loan_id  \
0  35736  unremit_loss_100312         Q05  51765139   
1  35736  unremit_loss_100312         Q05  51765139   
2  35736  unremit_loss_100312         Q05  51765139   

                                              meta  
0  {u'total_paid': u'75', u'total_expense': u'75'}  
1  {u'total_paid': u'75', u'total_expense': u'20'}  
2                           {u'total_paid': u'75'} 

df['meta'] = df['meta'].apply(ast.literal_eval)


df = df.iloc[df['meta'].str.get('total_expense').fillna(-1).astype(int).argsort()]
print (df)
      id            import_id investor_id   loan_id  \
2  35736  unremit_loss_100312         Q05  51765139   
1  35736  unremit_loss_100312         Q05  51765139   
0  35736  unremit_loss_100312         Q05  51765139   

                                          meta  
2                         {'total_paid': '75'}  
1  {'total_paid': '75', 'total_expense': '20'}  
0  {'total_paid': '75', 'total_expense': '75'}

Run Code Online (Sandbox Code Playgroud)

另一个解决方案：

df['new'] = df['meta'].str.get('total_expense').astype(int)
df = df.sort_values('new').drop('new', axis=1)

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，6 月前
查看次数：	402 次
最近记录：	6 年，6 月前