我有一个具有以下结构的数据帧:
|-- data: struct (nullable = true)
| |-- id: long (nullable = true)
| |-- keyNote: struct (nullable = true)
| | |-- key: string (nullable = true)
| | |-- note: string (nullable = true)
| |-- details: map (nullable = true)
| | |-- key: string
| | |-- value: string (valueContainsNull = true)
Run Code Online (Sandbox Code Playgroud)
如何展平结构并创建新的数据框:
|-- id: long (nullable = true)
|-- keyNote: struct (nullable = true)
| |-- key: string (nullable = true)
| |-- note: …Run Code Online (Sandbox Code Playgroud) PySpark 中遇到的错误:
pyspark.sql.utils.AnalysisException: "cannot resolve '`result_set`.`dates`.`trackers`['token']' due to data type mismatch: argument 2 requires integral type, however, ''token'' is of string type.;;\n'Project [result_parameters#517, result_set#518, <lambda>(result_set#518.dates.trackers[token]) AS result_set.dates.trackers.token#705]\n+- Relation[result_parameters#517,result_set#518] json\n"
Run Code Online (Sandbox Code Playgroud)
数据结构:
-- result_set: struct (nullable = true)
| |-- currency: string (nullable = true)
| |-- dates: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- date: string (nullable = true)
| | | |-- trackers: array (nullable = true)
| | | | …Run Code Online (Sandbox Code Playgroud)