Eda*_*ame 4 python apache-spark apache-spark-sql pyspark
我的PySpark数据框具有以下架构:
schema = spark_df.printSchema()
Run Code Online (Sandbox Code Playgroud)
root
|-- field_1: double (nullable = true)
|-- field_2: double (nullable = true)
|-- field_3 (nullable = true)
|-- field_4: double (nullable = true)
|-- field_5: double (nullable = true)
|-- field_6: double (nullable = true)
Run Code Online (Sandbox Code Playgroud)
我想再向架构中添加一个StructField,因此新架构看起来像:
root
|-- field_1: double (nullable = true)
|-- field_1: double (nullable = true)
|-- field_2: double (nullable = true)
|-- field_3 (nullable = true)
|-- field_4: double (nullable = true)
|-- field_5: double (nullable = true)
|-- field_6: double (nullable = true)
Run Code Online (Sandbox Code Playgroud)
我知道我可以手动创建一个new_schema,如下所示:
new_schema = StructType([StructField("field_0", StringType(), True),
:
StructField("field_6", IntegerType(), True)])
Run Code Online (Sandbox Code Playgroud)
这适用于少数字段但如果我有数百个字段则无法生成.所以我想知道是否有更优雅的方式将新字段添加到模式的开头?谢谢!
您可以复制现有字段和perpend:
to_prepend = [StructField("field_0", StringType(), True)]
StructType(to_prepend + df.schema.fields)
Run Code Online (Sandbox Code Playgroud)