我有一个带有 tlc 列的胶水表,它的数据类型是 Bigint。我正在尝试使用 PySpark 执行以下操作:
我的代码看起来像:
df = spark.sql('select tlc from monthly_table')
df.createOrReplaceTempView('sdc')
df_a = spark.sql('select tlc from monthly_table_2')
df_a.createOrReplaceTempView('abc')
df_moves = spark.sql('select * from abc a left join sdc s on a.tlc =s.tlc')
df_moves.write.parquet('<s3_path>', mode='overwrite')
Run Code Online (Sandbox Code Playgroud)
由于这个原因,我收到一个错误,如下所述:
Parquet column cannot be converted in file s3://<s3_path>. Column: [tlc], Expected: bigint, Found: INT32
Run Code Online (Sandbox Code Playgroud)
完整的跟踪:
py4j.protocol.Py4JJavaError: An error occurred while calling o419.parquet.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) …
Run Code Online (Sandbox Code Playgroud)