小编Kev*_*ger的帖子

有没有办法收集 pyspark 中嵌套模式中所有字段的名称

我希望收集嵌套模式中所有字段的名称。数据是从 json 文件导入的。

该架构如下所示：

root
 |-- column_a: string (nullable = true)
 |-- column_b: string (nullable = true)
 |-- column_c: struct (nullable = true)
 |    |-- nested_a: struct (nullable = true)
 |    |    |-- double_nested_a: string (nullable = true)
 |    |    |-- double_nested_b: string (nullable = true)
 |    |    |-- double_nested_c: string (nullable = true)
 |    |-- nested_b: string (nullable = true)
 |-- column_d: string (nullable = true)

Run Code Online (Sandbox Code Playgroud)

如果我使用df.schema.fieldsordf.schema.names它只是打印列层的名称 - 没有嵌套列。

我想要的期望输出是一个 python 列表，其中包含所有列名称，例如：

['column_a', 'columb_b', 'column_c.nested_a.double_nested.a', …

Run Code Online (Sandbox Code Playgroud)

apache-spark apache-spark-sql pyspark

Kev*_*ger

lucky-day

2
推荐指数

1
解决办法

3006
查看次数

标签统计

apache-spark ×1

apache-spark-sql ×1

pyspark ×1

有没有办法收集 pyspark 中嵌套模式中所有字段的名称

标签 统计

小编Kev_ger的帖子

标签统计