相关疑难解决方法(0)

如何选择每组的第一行?

我有一个DataFrame生成如下:

df.groupBy($"Hour", $"Category")
  .agg(sum($"value") as "TotalValue")
  .sort($"Hour".asc, $"TotalValue".desc))
Run Code Online (Sandbox Code Playgroud)

结果如下:

+----+--------+----------+
|Hour|Category|TotalValue|
+----+--------+----------+
|   0|   cat26|      30.9|
|   0|   cat13|      22.1|
|   0|   cat95|      19.6|
|   0|  cat105|       1.3|
|   1|   cat67|      28.5|
|   1|    cat4|      26.8|
|   1|   cat13|      12.6|
|   1|   cat23|       5.3|
|   2|   cat56|      39.6|
|   2|   cat40|      29.7|
|   2|  cat187|      27.9|
|   2|   cat68|       9.8|
|   3|    cat8|      35.6|
| ...|    ....|      ....|
+----+--------+----------+
Run Code Online (Sandbox Code Playgroud)

如您所见,DataFrame按Hour递增顺序排序,然后按TotalValue降序排序.

我想选择每组的顶行,即

  • 来自小时组== 0选择(0,cat26,30.9)
  • 来自小时组== 1选择(1,cat67,28.5)
  • 来自小时组== …

sql scala dataframe apache-spark apache-spark-sql

122
推荐指数
3
解决办法
8万
查看次数

Pandas数据帧获得每个组的第一行

我有一只DataFrame像熊猫一样的熊猫.

df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7],
                'value'  : ["first","second","second","first",
                            "second","first","third","fourth",
                            "fifth","second","fifth","first",
                            "first","second","third","fourth","fifth"]})
Run Code Online (Sandbox Code Playgroud)

我想通过["id","value"]对此进行分组,并得到每个组的第一行.

        id   value
0        1   first
1        1  second
2        1  second
3        2   first
4        2  second
5        3   first
6        3   third
7        3  fourth
8        3   fifth
9        4  second
10       4   fifth
11       5   first
12       6   first
13       6  second
14       6   third
15       7  fourth
16       7   fifth
Run Code Online (Sandbox Code Playgroud)

预期结果

    id   value
     1   first
     2   first
     3   first
     4  second
     5 …
Run Code Online (Sandbox Code Playgroud)

python dataframe pandas

110
推荐指数
5
解决办法
11万
查看次数

标签 统计

dataframe ×2

apache-spark ×1

apache-spark-sql ×1

pandas ×1

python ×1

scala ×1

sql ×1