相关疑难解决方法(0)

reading json file in pyspark

I'm new to PySpark, Below is my JSON file format from kafka.

{
        "header": {
        "platform":"atm",
        "version":"2.0"
       }
        "details":[
       {
        "abc":"3",
        "def":"4"
       },
       {
        "abc":"5",
        "def":"6"
       },
       {
        "abc":"7",
        "def":"8"
       }    
      ]
    }
Run Code Online (Sandbox Code Playgroud)

how can I read through the values of all "abc" "def" in details and add this is to a new list like this [(1,2),(3,4),(5,6),(7,8)]. The new list will be used to create a spark data frame. how can i do this in pyspark.I tried the below …

apache-spark spark-streaming pyspark

10
推荐指数
3
解决办法
6万
查看次数

在pyspark中将行转换为Dictionary

我在pyspark中有一个DataFrame(df),通过从hive表中读取:

df=spark.sql('select * from <table_name>')


+++++++++++++++++++++++++++++++++++++++++++
|  Name    |    URL visited               |
+++++++++++++++++++++++++++++++++++++++++++
|  person1 | [google,msn,yahoo]           |
|  person2 | [fb.com,airbnb,wired.com]    |
|  person3 | [fb.com,google.com]          |
+++++++++++++++++++++++++++++++++++++++++++
Run Code Online (Sandbox Code Playgroud)

当我尝试以下,得到一个错误

df_dict = dict(zip(df['name'],df['url']))
"TypeError: zip argument #1 must support iteration."
Run Code Online (Sandbox Code Playgroud)

type(df.name) is of 'pyspark.sql.column.Column'

我如何创建如下的字典,以后可以迭代

{'person1':'google','msn','yahoo'}
{'person2':'fb.com','airbnb','wired.com'}
{'person3':'fb.com','google.com'}
Run Code Online (Sandbox Code Playgroud)

感谢您的想法和帮助.

pyspark

5
推荐指数
3
解决办法
9455
查看次数

标签 统计

pyspark ×2

apache-spark ×1

spark-streaming ×1