Pra*_*nan 7 nullable apache-spark apache-spark-sql pyspark
当保存带有添加了“withColumn”函数的新列的 pyspark 数据帧时,可空性从 false 更改为 true。
版本信息:Python 3.7.3/Spark2.4.0-cdh6.1.1
>>> l = [('Alice', 1)]
>>> df = spark.createDataFrame(l)
>>> df.printSchema()
root
|-- _1: string (nullable = true)
|-- _2: long (nullable = true)
>>> from pyspark.sql.functions import lit
>>> df = df.withColumn('newCol', lit('newVal'))
>>> df.printSchema()
root
|-- _1: string (nullable = true)
|-- _2: long (nullable = true)
|-- newCol: string (nullable = false)
>>> df.write.saveAsTable('default.withcolTest', mode='overwrite')
>>> spark.sql("select * from default.withcolTest").printSchema()
root
|-- _1: string (nullable = true)
|-- _2: long (nullable = true)
|-- newCol: string (nullable = true)
Run Code Online (Sandbox Code Playgroud)
为什么在持久化数据帧时,newCol添加函数的列的可为空标志会发生变化?withColumn
| 归档时间: |
|
| 查看次数: |
266 次 |
| 最近记录: |