如何将sql输出转换为Dataframe?

Alc*_*des 6 apache-spark apache-spark-sql pyspark databricks azure-databricks

我有一个 Dataframe,从中创建一个临时视图以运行 sql 查询。经过几次 sql 查询后,我想将 sql 查询的输出转换为新的 Dataframe。我希望将数据放回到 Dataframe 中的原因是这样我可以将其保存到 Blob 存储中。

那么,问题是:将 sql 查询输出转换为 Dataframe 的正确方法是什么?

这是我到目前为止的代码:

%scala
//read data from Azure blob
...
var df = spark.read.parquet(some_path)

// create temp view
df.createOrReplaceTempView("data_sample")

%sql
//have some sqlqueries, the one below is just an example
SELECT
   date,
   count(*) as cnt
FROM
   data_sample
GROUP BY
   date

//Now I want to have a dataframe  that has the above sql output. How to do that?
Preferably the code would be in python or scala.


Run Code Online (Sandbox Code Playgroud)

Lui*_*ola 6

斯卡拉:

var df = spark.sql(s"""
SELECT
   date,
   count(*) as cnt
FROM
   data_sample
GROUP BY
   date
""")
Run Code Online (Sandbox Code Playgroud)

派斯帕克:

df = spark.sql(f'''
SELECT
   date,
   count(*) as cnt
FROM
   data_sample
GROUP BY
   date
''')
Run Code Online (Sandbox Code Playgroud)