如何在 Scala Spark 中四舍五入

Question

如何在 Scala Spark 中四舍五入

Iva*_*van 4 concurrency scala dataframe apache-spark

我有一个（大 ~ 100 万个）Scala Spark DataFrame，其中包含以下数据：

id,score
1,0.956
2,0.977
3,0.855
4,0.866
...

Run Code Online (Sandbox Code Playgroud)

如何将分数离散/四舍五入到最接近的 0.05 小数位？

预期结果：

id,score
1,0.95
2,1.00
3,0.85
4,0.85
...

Run Code Online (Sandbox Code Playgroud)

希望避免使用 UDF 来最大化性能。

Answer 1

iri*_*rrr 10

答案可以更简单：

dataframe.withColumn("rounded_score", round(col("score"), 2))

Run Code Online (Sandbox Code Playgroud)

有一个方法

def round(e: Column, scale: Int)

Run Code Online (Sandbox Code Playgroud)

使用 HALF_UP 舍入模式将的值舍入e到scale小数位

Answer 2

soo*_*ote 8

您可以使用 spark 内置函数来做到这一点

dataframe.withColumn("rounded_score", round(col("score") * 100 / 5) * 5 / 100)

Run Code Online (Sandbox Code Playgroud)

乘以它，以便您想要的精度是一个整数。
然后将该数字除以 5，然后舍入。
现在这个数可以被 5 整除，所以乘以 5 可以得到整个数
除以 100 再次获得正确的精度。

结果

+---+-----+-------------+
| id|score|rounded_score|
+---+-----+-------------+
|  1|0.956|         0.95|
|  2|0.977|          1.0|
|  3|0.855|         0.85|
|  4|0.866|         0.85|
+---+-----+-------------+

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，11 月前
查看次数：	23020 次
最近记录：	5 年，7 月前