cha*_*a k 10 apache-spark apache-spark-sql delta-lake
是否有一个 SQL 命令可以轻松地用来更改 Delta 表中现有列的数据类型。我需要将列数据类型从 BIGINT 更改为 STRING。下面是我尝试使用的 SQL 命令,但没有成功。
%sql ALTER TABLE [TABLE_NAME] ALTER COLUMN [COLUMN_NAME] STRING
Run Code Online (Sandbox Code Playgroud)
我收到的错误:
org.apache.spark.sql.AnalysisException
ALTER TABLE CHANGE COLUMN is not supported for changing column 'bam_user' with type
'IntegerType' to 'bam_user' with type 'StringType'
Run Code Online (Sandbox Code Playgroud)
SQL不支持这个,但可以在python中完成:
from pyspark.sql import functions as F
# set dataset location and columns with new types
table_path = '/mnt/dataset_location...'
types_to_change = {
'column_1' : 'int',
'column_2' : 'string',
'column_3' : 'double'
}
# load to dataframe, change types
df = (
spark.read
.format('delta')
.load(table_path)
.withColumns(
{
col: F.col(col).cast(typ)
for col, typ in types_to_change.items()
}
)
)
# save df with new types overwriting the schema
(
df.write
.format("delta")
.mode("overwrite")
.option("overwriteSchema", True)
.save(f"dbfs:{table_path}")
)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
22739 次 |
| 最近记录: |