Geo*_*ler 4 apache-spark apache-spark-sql
如何在 SparkswithColumn语句中使用poseexplode?
Seq(Array(1,2,3)).toDF.select(col("*"), posexplode(col("value")) as Seq("position", "value")).show
工作得很好,然而:
Seq(Array(1,2,3)).toDF.withColumn("foo", posexplode(col("value"))).show
失败并显示:
org.apache.spark.sql.AnalysisException: The number of aliases supplied in the AS clause does not match the number of columns output by the UDTF expected 2 aliases but got foo ;
小智 7
withColumn()您可以选择数据框中的所有列并附加 的结果,包括和字段posexplode()的别名,而不是使用。这是使用 PySpark 的示例。poscol
from pyspark.sql import functions as F
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()
df = spark.createDataFrame(
    [(["a"], ), (["b", "c"], ), (["d", "e", "f"], )],
    ["A"],
)
df.show()
# +---------+
# |        A|
# +---------+
# |      [a]|
# |   [b, c]|
# |[d, e, f]|
# +---------+
df = df.select("*", F.posexplode("A").alias("B", "C"))
df.show()
# +---------+---+---+
# |        A|  B|  C|
# +---------+---+---+
# |      [a]|  0|  a|
# |   [b, c]|  0|  b|
# |   [b, c]|  1|  c|
# |[d, e, f]|  0|  d|
# |[d, e, f]|  1|  e|
# |[d, e, f]|  2|  f|
# +---------+---+---+
| 归档时间: | 
 | 
| 查看次数: | 6320 次 | 
| 最近记录: |