Gsq*_*are 3 scala tuples dataframe apache-spark
我有:
val DF1 = sparkSession.sql("select col1,col2,col3 from table");
val tupleList = DF1.select("col1","col2").rdd.map(r => (r(0),r(1))).collect()
tupleList.foreach(x=> x.productIterator.foreach(println))
Run Code Online (Sandbox Code Playgroud)
但是我没有得到输出中的所有元组.问题在哪里?
col1 col2
AA CCC
AA BBB
DD CCC
AB BBB
Others BBB
GG ALL
EE ALL
Others ALL
ALL BBB
NU FFF
NU Others
Others Others
C FFF
Run Code Online (Sandbox Code Playgroud)
我得到的输出是:
CCC AA BBB AA Others AA Others DD ALL Others ALL GG ALL ALL
Sai*_*tam 11
scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> val df1 = hiveContext.sql("select id, name from class_db.students")
scala> df1.show()
+----+-------+
| id| name|
+----+-------+
|1001| John|
|1002|Michael|
+----+-------+
scala> df1.select("id", "name").rdd.map(x => (x.get(0), x.get(1))).collect()
res3: Array[(Any, Any)] = Array((1001,John), (1002,Michael))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5248 次 |
| 最近记录: |