2da*_*aaa 4 python sql window-functions pandas
在Pandas中是否存在与SQL的窗口函数等效的惯用语?例如,在Pandas中写出相当于这个的最紧凑的方法是什么?:
SELECT state_name,
state_population,
SUM(state_population)
OVER() AS national_population
FROM population
ORDER BY state_name
Run Code Online (Sandbox Code Playgroud)
或这个?:
SELECT state_name,
state_population,
region,
SUM(state_population)
OVER(PARTITION BY region) AS regional_population
FROM population
ORDER BY state_name
Run Code Online (Sandbox Code Playgroud)
Max*_*axU 12
对于第一个SQL:
SELECT state_name,
state_population,
SUM(state_population)
OVER() AS national_population
FROM population
ORDER BY state_name
Run Code Online (Sandbox Code Playgroud)
熊猫:
df.assign(national_population=df.state_population.sum()).sort_values('state_name')
Run Code Online (Sandbox Code Playgroud)
对于第二个SQL:
SELECT state_name,
state_population,
region,
SUM(state_population)
OVER(PARTITION BY region) AS regional_population
FROM population
ORDER BY state_name
Run Code Online (Sandbox Code Playgroud)
熊猫:
df.assign(regional_population=df.groupby('region')['state_population'].transform('sum')) \
.sort_values('state_name')
Run Code Online (Sandbox Code Playgroud)
DEMO:
In [238]: df
Out[238]:
region state_name state_population
0 1 aaa 100
1 1 bbb 110
2 2 ccc 200
3 2 ddd 100
4 2 eee 100
5 3 xxx 55
Run Code Online (Sandbox Code Playgroud)
national_population:
In [246]: df.assign(national_population=df.state_population.sum()).sort_values('state_name')
Out[246]:
region state_name state_population national_population
0 1 aaa 100 665
1 1 bbb 110 665
2 2 ccc 200 665
3 2 ddd 100 665
4 2 eee 100 665
5 3 xxx 55 665
Run Code Online (Sandbox Code Playgroud)
regional_population:
In [239]: df.assign(regional_population=df.groupby('region')['state_population'].transform('sum')) \
...: .sort_values('state_name')
Out[239]:
region state_name state_population regional_population
0 1 aaa 100 210
1 1 bbb 110 210
2 2 ccc 200 400
3 2 ddd 100 400
4 2 eee 100 400
5 3 xxx 55 55
Run Code Online (Sandbox Code Playgroud)