DataFrame错误:"重载的方法值选择与替代"

Jas*_*Liu 1 scala dataframe apache-spark

我尝试通过选择小时+分钟/ 60以及数据帧中的其他列来创建新数据帧,如下所示:

val logon11 = logon1.select("User","PC","Year","Month","Day","Hour","Minute",$"Hour"+$"Minute"/60)
Run Code Online (Sandbox Code Playgroud)

我收到以下错误:

<console>:38: error: overloaded method value select with alternatives:
  (col: String,cols: String*)org.apache.spark.sql.DataFrame <and>
  (cols: org.apache.spark.sql.Column*)org.apache.spark.sql.DataFrame
cannot be applied to (String, String, String, String, String, String, String,org.apache.spark.sql.Colum)

...
Run Code Online (Sandbox Code Playgroud)

也许我知道原因是我无法同时使用"select"获取这些类型的DataFrame.那我怎么能得到这样的数据帧呢?

avr*_*avr 12

DF的select方法接受所有Strings或全部类型的参数org.apache.spark.sql.Column但不同时使用两者的混合.

在你的情况下,你传递两个参数StringColumn输入select方法.

val logon11 = logon1.select($"User",$"PC",$"Year",$"Month",$"Day",$"Hour",$"Minute",$"Hour"+$"Minute"/60 as "total_hours")
Run Code Online (Sandbox Code Playgroud)

希望能帮助到你!


Pra*_*ode 5

您可以使用withColumn从现有列或基于以下条件创建新列

val logon1 = Seq(("User1","PC1",2017,2,12,12,10)).toDF("User","PC","Year","Month","Day","Hour","Minute")
val logon11 = logon1.withColumn("new_col", $"Hour"+$"Minute"/60)
logon11.printSchema()
logon11.show
Run Code Online (Sandbox Code Playgroud)

输出:

root
 |-- User: string (nullable = true)
 |-- PC: string (nullable = true)
 |-- Year: integer (nullable = false)
 |-- Month: integer (nullable = false)
 |-- Day: integer (nullable = false)
 |-- Hour: integer (nullable = false)
 |-- Minute: integer (nullable = false)
 |-- new_col: double (nullable = true)


+-----+---+----+-----+---+----+------+------------------+
| User| PC|Year|Month|Day|Hour|Minute|           new_col|
+-----+---+----+-----+---+----+------+------------------+
|User1|PC1|2017|    2| 12|  12|    10|12.166666666666666|
+-----+---+----+-----+---+----+------+------------------+
Run Code Online (Sandbox Code Playgroud)