Kas*_*ash 4 python dataframe pandas
I am used to using SQL to solve hierarchical joins but I'm wondering if it can be done in Python, maybe using Pandas. And which one is more efficient to go for?
CSV Data:
emp_id,fn,ln,mgr_id
1,Matthew,Reichek,NULL
2,John,Cottone,3
3,Chris,Winter,1
4,Sergey,Bobkov,2
5,Andrey,Botelli,2
6,Karen,Goetz,7
7,Tri,Pham,3
8,Drew,Thompson,7
9,BD,Alabi,7
10,Sreedhar,Kavali,7
I want to find the Level of each employee (Boss is Level 1 and so on):
My Recursive Code in SQL would be:
with recursive cte as
    (
    select employee_id, first_name, last_name, manager_id, 1 as level
    from icqa.employee
    where manager_id is null
    union 
    select e.employee_id, e.first_name, e.last_name, e.manager_id, cte.level + 1
    from icqa.employee e
    inner join cte
        on e.manager_id = cte.employee_id
    where e.manager_id is not null
    ) 
select * from cte
您可以创建一个字典映射emp_id,mgr_id然后创建一个递归函数,例如
idmap = dict(zip(df['emp_id'], df['mgr_id']))
def depth(id_):
    if np.isnan(id_):
        return 1
    return depth(idmap[id_]) + 1
计算给定的深度id。为了提高效率(通过不重复计算相同的id),您可以使用记忆化(由下面的@functools.lru_cache装饰器处理):
import numpy as np
import pandas as pd
import functools
nan = np.nan
df = pd.DataFrame({'emp_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'fn': ['Matthew', 'John', 'Chris', 'Sergey', 'Andrey', 'Karen', 'Tri', 'Drew', 'BD', 'Sreedhar'], 'ln': ['Reichek', 'Cottone', 'Winter', 'Bobkov', 'Botelli', 'Goetz', 'Pham', 'Thompson', 'Alabi', 'Kavali'], 'mgr_id': [nan, 3.0, 1.0, 2.0, 2.0, 7.0, 3.0, 7.0, 7.0, 7.0]})
def make_depth(df):
    idmap = dict(zip(df['emp_id'], df['mgr_id']))
    @functools.lru_cache()
    def depth(id_):
        if np.isnan(id_):
            return 1
        return depth(idmap[id_]) + 1
    return depth
df['depth'] = df['mgr_id'].apply(make_depth(df))
print(df.sort_values(by='depth'))
产量
   emp_id        fn        ln  mgr_id  depth
0       1   Matthew   Reichek     NaN      1
2       3     Chris    Winter     1.0      2
1       2      John   Cottone     3.0      3
6       7       Tri      Pham     3.0      3
3       4    Sergey    Bobkov     2.0      4
4       5    Andrey   Botelli     2.0      4
5       6     Karen     Goetz     7.0      4
7       8      Drew  Thompson     7.0      4
8       9        BD     Alabi     7.0      4
9      10  Sreedhar    Kavali     7.0      4
| 归档时间: | 
 | 
| 查看次数: | 3532 次 | 
| 最近记录: |