Tib*_*rzz 2 dataframe apache-spark pyspark
我有一个数据帧:
# +---+--------+---------+
# | id| rank | value |
# +---+--------+---------+
# | 1| A | 10 |
# | 2| B | 46 |
# | 3| D | 8 |
# | 4| C | 8 |
# +---+--------+---------+
Run Code Online (Sandbox Code Playgroud)
我想按价值排序,然后排名.这看起来应该很简单,但是我没有看到它是如何在文档中完成的,或者是Pyspark的SO,仅用于R和Scala.
这是排序后应该看的样子,.show()应该打印:
# +---+--------+---------+
# | id| rank | value |
# +---+--------+---------+
# | 4| C | 8 |
# | 3| D | 8 |
# | 1| A | 10 |
# | 2| B | 46 |
# +---+--------+---------+
Run Code Online (Sandbox Code Playgroud)
df.orderBy(["value", "rank"], ascending=[1, 1])
Run Code Online (Sandbox Code Playgroud)
参考:http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy
| 归档时间: |
|
| 查看次数: |
10719 次 |
| 最近记录: |