Pandas相当于SQL窗口函数

2da*_*aaa 4 python sql window-functions pandas

在Pandas中是否存在与SQL的窗口函数等效的惯用语?例如,在Pandas中写出相当于这个的最紧凑的方法是什么?:

SELECT state_name,  
       state_population,
       SUM(state_population)
        OVER() AS national_population
FROM population   
ORDER BY state_name 
Run Code Online (Sandbox Code Playgroud)

或这个?:

SELECT state_name,  
       state_population,
       region,
       SUM(state_population)
        OVER(PARTITION BY region) AS regional_population
FROM population    
ORDER BY state_name
Run Code Online (Sandbox Code Playgroud)

Max*_*axU 12

对于第一个SQL:

SELECT state_name,  
       state_population,
       SUM(state_population)
        OVER() AS national_population
FROM population   
ORDER BY state_name 
Run Code Online (Sandbox Code Playgroud)

熊猫:

df.assign(national_population=df.state_population.sum()).sort_values('state_name')
Run Code Online (Sandbox Code Playgroud)

对于第二个SQL:

SELECT state_name,  
       state_population,
       region,
       SUM(state_population)
        OVER(PARTITION BY region) AS regional_population
FROM population    
ORDER BY state_name
Run Code Online (Sandbox Code Playgroud)

熊猫:

df.assign(regional_population=df.groupby('region')['state_population'].transform('sum')) \
  .sort_values('state_name')
Run Code Online (Sandbox Code Playgroud)

DEMO:

In [238]: df
Out[238]:
   region state_name  state_population
0       1        aaa               100
1       1        bbb               110
2       2        ccc               200
3       2        ddd               100
4       2        eee               100
5       3        xxx                55
Run Code Online (Sandbox Code Playgroud)

national_population:

In [246]: df.assign(national_population=df.state_population.sum()).sort_values('state_name')
Out[246]:
   region state_name  state_population  national_population
0       1        aaa               100                  665
1       1        bbb               110                  665
2       2        ccc               200                  665
3       2        ddd               100                  665
4       2        eee               100                  665
5       3        xxx                55                  665
Run Code Online (Sandbox Code Playgroud)

regional_population:

In [239]: df.assign(regional_population=df.groupby('region')['state_population'].transform('sum')) \
     ...:   .sort_values('state_name')
Out[239]:
   region state_name  state_population  regional_population
0       1        aaa               100                  210
1       1        bbb               110                  210
2       2        ccc               200                  400
3       2        ddd               100                  400
4       2        eee               100                  400
5       3        xxx                55                   55
Run Code Online (Sandbox Code Playgroud)