如何在Zeppelin/Spark/Scala中打印数据框?

sch*_*oon 15 scala apache-spark apache-zeppelin

我在Zeppelin 0.7笔记本中使用Spark 2和Scala 2.11.我有一个数据帧,我可以像这样打印:

dfLemma.select("text", "lemma").show(20,false)
Run Code Online (Sandbox Code Playgroud)

输出看起来像:

+---------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|text                                                                                                                       |lemma                                                                                                                                                                  |
+---------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|RT @Dope_Promo: When you and your crew beat your high scores on FUGLY FROG  https://time.com/Sxp3Onz1w8                    |[rt, @dope_promo, :, when, you, and, you, crew, beat, you, high, score, on, FUGLY, FROG, https://time.com/sxp3onz1w8]                                                      |
|RT @axolROSE: Did yall just call Kermit the frog a lizard?  https://time.com/wDAEAEr1Ay                                        |[rt, @axolrose, :, do, yall, just, call, Kermit, the, frog, a, lizard, ?, https://time.com/wdaeaer1ay]                                                                     |
Run Code Online (Sandbox Code Playgroud)

我试图通过以下方式在Zeppelin中使输出更好:

val printcols= dfLemma.select("text", "lemma")
println("%table " + printcols)
Run Code Online (Sandbox Code Playgroud)

这给出了这个输出:

printcols: org.apache.spark.sql.DataFrame = [text: string, lemma: array<string>]
Run Code Online (Sandbox Code Playgroud)

并以一个新的空白齐柏林飞艇为首

[text: string, lemma: array]
Run Code Online (Sandbox Code Playgroud)

有没有办法让数据框显示为格式良好的表格?TIA!

Dan*_*ula 53

在Zeppelin中,您可以使用它z.show(df)来展示漂亮的桌子.这是一个例子:

val df = Seq(
  (1,1,1), (2,2,2), (3,3,3)
).toDF("first_column", "second_column", "third_column")

z.show(df)
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

  • @schoon不客气!您可以使用第二个参数限制行数:z.show(df,10) (2认同)