scala 中 Apache Spark 中不支持的文字类型类

Question

scala 中 Apache Spark 中不支持的文字类型类

我有以下数据：

    +---------------+-----------+-------------+-----+------+
    |   time_stamp_0|sender_ip_1|receiver_ip_2|count|attack|
    +---------------+-----------+-------------+-----+------+
    |06:10:55.881073|   10.0.0.3|     10.0.0.1|   1 |     0|
    |06:10:55.881095|   10.0.0.3|     10.0.0.1|   2 |     0|
    |06:10:55.881114|   10.0.0.3|     10.0.0.1|   3 |     0|
    |06:10:55.881133|   10.0.0.3|     10.0.0.1|   4 |     0|
    |06:10:55.881152|   10.0.0.3|     10.0.0.1|   5 |     0|
    |06:10:55.881172|   10.0.0.3|     10.0.0.1|   6 |     0|
    |06:10:55.881191|   10.0.0.3|     10.0.0.1|   7 |     0|
    |06:10:55.881210|   10.0.0.3|     10.0.0.1|   8 |     0|

Run Code Online (Sandbox Code Playgroud)

我需要在我的数据框中将计数列的总标准偏差与其自身（带有计数列）进行比较。这是我的代码：

val std_dev=Dataframe_addcount.agg(stddev_pop($"count"))

val final_add_count_attack = Dataframe_addcount.withColumn("attack", when($"count" > std_dev , 0).otherwise(1))

Run Code Online (Sandbox Code Playgroud)

但是我的问题是，我收到以下错误：

Unsupported literal type class org.apache.spark.sql.Dataset [stddev_pop(count): double]

Run Code Online (Sandbox Code Playgroud)

你可以帮帮我吗？非常感谢。

Answer 1

T. *_*ęda 3

这是因为在什么时候以及在什么情况下你应该使用值；不是 std_dev 是一个 DataFrame。

你可以得到结果：

val stdDevValue = std_dev.head().getDouble(0)

val final_add_count_attack = Dataframe_addcount.withColumn("attack", when($"count" > lit(std_dev), lit(0)).otherwise(lit(1)))

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，4 月前
查看次数：	17361 次
最近记录：	5 年，2 月前