Spark SQL:如何将REST服务中的json数据作为DataFrame使用

Question

Spark SQL:如何将REST服务中的json数据作为DataFrame使用

Kir*_*ran 11 hdinsight apache-spark-sql spark-dataframe

我需要从提供REST接口的Web服务中读取一些JSON数据,以便从我的SPARK SQL代码中查询数据以进行分析.我能够读取存储在blob存储中的JSON并使用它.

我想知道什么是从REST服务读取数据的最佳方式,并像其他任何方式一样使用它DataFrame.

BTW我正在使用,SPARK 1.6 of Linux cluster on HD insight如果这有帮助.如果有人可以共享任何代码片段,我也会很感激,因为我对SPARK环境仍然很新.

Answer 1

agg*_*FTW 7

在Spark 1.6上:

如果您使用的是Python,请使用请求库获取信息,然后从中创建RDD.Scala(相关线程)必须有一些类似的库.然后就做:

json_str = '{"executorCores": 2, "kind": "pyspark", "driverMemory": 1000}'
rdd = sc.parallelize([json_str])
json_df = sqlContext.jsonRDD(rdd)
json_df

Run Code Online (Sandbox Code Playgroud)

Scala代码:

val anotherPeopleRDD = sc.parallelize(
  """{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil)
val anotherPeople = sqlContext.read.json(anotherPeopleRDD)

Run Code Online (Sandbox Code Playgroud)

这来自:http: //spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets

归档时间：	10 年前
查看次数：	11557 次
最近记录：	10 年前