I'm new to PySpark, Below is my JSON file format from kafka.
{
"header": {
"platform":"atm",
"version":"2.0"
}
"details":[
{
"abc":"3",
"def":"4"
},
{
"abc":"5",
"def":"6"
},
{
"abc":"7",
"def":"8"
}
]
}
Run Code Online (Sandbox Code Playgroud)
how can I read through the values of all "abc" "def" in details and add this is to a new list like this [(1,2),(3,4),(5,6),(7,8)]. The new list will be used to create a spark data frame. how can i do this in pyspark.I tried the below …
我在pyspark中有一个DataFrame(df),通过从hive表中读取:
df=spark.sql('select * from <table_name>')
+++++++++++++++++++++++++++++++++++++++++++
| Name | URL visited |
+++++++++++++++++++++++++++++++++++++++++++
| person1 | [google,msn,yahoo] |
| person2 | [fb.com,airbnb,wired.com] |
| person3 | [fb.com,google.com] |
+++++++++++++++++++++++++++++++++++++++++++
Run Code Online (Sandbox Code Playgroud)
当我尝试以下,得到一个错误
df_dict = dict(zip(df['name'],df['url']))
"TypeError: zip argument #1 must support iteration."
Run Code Online (Sandbox Code Playgroud)
type(df.name) is of 'pyspark.sql.column.Column'
我如何创建如下的字典,以后可以迭代
{'person1':'google','msn','yahoo'}
{'person2':'fb.com','airbnb','wired.com'}
{'person3':'fb.com','google.com'}
Run Code Online (Sandbox Code Playgroud)
感谢您的想法和帮助.