aggregrated_table = df_input.groupBy('city', 'income_bracket') \
.agg(
count('suburb').alias('suburb'),
sum('population').alias('population'),
sum('gross_income').alias('gross_income'),
sum('no_households').alias('no_households'))
Run Code Online (Sandbox Code Playgroud)
想按城市和收入等级分组,但在每个城市内,某些郊区有不同的收入等级。我如何按每个城市最常出现的收入等级分组?
例如:
aggregrated_table = df_input.groupBy('city', 'income_bracket') \
.agg(
count('suburb').alias('suburb'),
sum('population').alias('population'),
sum('gross_income').alias('gross_income'),
sum('no_households').alias('no_households'))
Run Code Online (Sandbox Code Playgroud)
将按income_bracket_10 分组