Scala/Spark-将整数与数据帧列中的每个值相乘

Amb*_*ber 4 scala apache-spark

我有一个示例数据框

df_that_I_have
+---------+---------+-------+
| country | members | some  |
+---------+---------+-------+
| India   | 50      | 1     |
+---------+---------+-------+
| Japan   | 20      | 3     |
+---------+---------+-------+
| India   | 20      | 1     |
+---------+---------+-------+
| Japan   | 10      | 3     |
+---------+---------+-------+
Run Code Online (Sandbox Code Playgroud)

我想要一个看起来像这样的数据框

df_that_I_want
+---------+---------+-------+
| country | members | some  |
+---------+---------+-------+
| India   | 70      | 10    | // 5 * Sum of "some" for India, i.e. (1 + 1)
+---------+---------+-------+
| Japan   | 30      | 30    | // 5 * Sum of "some" for Japan, i.e. (3 + 3)
+---------+---------+-------+
Run Code Online (Sandbox Code Playgroud)

第二个数据帧的总和memberssome乘以 5的总和。

这就是我正在做的事情来实现这一目标

val df_that_I_want = df_that_I_have
                        .select(df_that_I_have("country"),
                                df_that_I_have.groupBy("country").sum("members"),
                                5 * df_that_I_have.groupBy("country").sum("some")) //Problem here
Run Code Online (Sandbox Code Playgroud)

但是编译器不允许我这样做,因为显然我不能用一列乘以 5。

如何将整数值乘以some每个国家的总和?

Raj*_*hra 5

您可以尝试点亮功能。

scala> val df_that_I_have = Seq(("India",50,1),("India",20,1),("Japan",20,3),("Japan",10,3)).toDF("Country","Members","Some")
df_that_I_have: org.apache.spark.sql.DataFrame = [Country: string, Members: int, Some: int]

scala> val df1 = df_that_I_have.groupBy("country").agg(sum("members"), sum("some") * lit(5))
df1: org.apache.spark.sql.DataFrame = [country: string, sum(members): bigint, ((sum(some),mode=Complete,isDistinct=false) * 5): bigint]

scala> val df_that_I_want= df1.select($"Country",$"sum(Members)".alias("Members"), $"((sum(Some),mode=Complete,isDistinct=false) * 5)".alias("Some"))
df_that_I_want: org.apache.spark.sql.DataFrame = [Country: string, Members: bigint, Some: bigint]

scala> df_that_I_want.show

+-------+-------+----+
|Country|Members|Some|
+-------+-------+----+
|  India|     70|  10|
|  Japan|     30|  30|
+-------+-------+----+
Run Code Online (Sandbox Code Playgroud)