Par*_*ari 3 schema struct dataframe apache-spark pyspark
我有一个具有以下架构的数据框:
root
|-- _id: long (nullable = true)
|-- student_info: struct (nullable = true)
| |-- firstname: string (nullable = true)
| |-- lastname: string (nullable = true)
| |-- major: string (nullable = true)
| |-- hounour_roll: boolean (nullable = true)
|-- school_name: string (nullable = true)
Run Code Online (Sandbox Code Playgroud)
如何仅获取“student_info”下的列列表?IE["firstname","lastname","major","honour_roll"]
以下所有内容都返回结构体的字段名称列表。该.columns方法看起来最干净。
df.select("student_info.*").columns
Run Code Online (Sandbox Code Playgroud)
df.schema["student_info"].dataType.names
Run Code Online (Sandbox Code Playgroud)
df.schema["student_info"].dataType.fieldNames()
Run Code Online (Sandbox Code Playgroud)
df.select("student_info.*").schema.names
Run Code Online (Sandbox Code Playgroud)
df.select("student_info.*").schema.fieldNames()
Run Code Online (Sandbox Code Playgroud)