将dataframe列转换为元组列表

Question

将dataframe列转换为元组列表

Gsq*_*are 3 scala tuples dataframe apache-spark

我有:

val DF1 = sparkSession.sql("select col1,col2,col3 from table");
val tupleList = DF1.select("col1","col2").rdd.map(r => (r(0),r(1))).collect()

tupleList.foreach(x=> x.productIterator.foreach(println))

Run Code Online (Sandbox Code Playgroud)

但是我没有得到输出中的所有元组.问题在哪里？

col1 col2
AA  CCC
AA  BBB 
DD  CCC 
AB  BBB 
Others  BBB 
GG  ALL 
EE  ALL 
Others  ALL 
ALL BBB 
NU FFF 
NU  Others 
Others  Others 
C   FFF

Run Code Online (Sandbox Code Playgroud)

我得到的输出是: CCC AA BBB AA Others AA Others DD ALL Others ALL GG ALL ALL

Answer 1

Sai*_*tam 11

scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> val df1 = hiveContext.sql("select id, name from class_db.students")
scala> df1.show()
+----+-------+
|  id|   name|
+----+-------+
|1001|   John|
|1002|Michael|
+----+-------+

scala> df1.select("id", "name").rdd.map(x => (x.get(0), x.get(1))).collect()
res3: Array[(Any, Any)] = Array((1001,John), (1002,Michael))

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，1 月前
查看次数：	5248 次
最近记录：	9 年，1 月前