两个数组的成员相乘

Zyg*_*ygD 5 arrays multiplication apache-spark apache-spark-sql pyspark

我有下表:

from pyspark.sql import SparkSession, functions as F

spark = SparkSession.builder.getOrCreate()

cols = [  'a1',   'a2']
data = [([2, 3], [4, 5]),
        ([1, 3], [2, 4])]

df = spark.createDataFrame(data, cols)
df.show()
#  +------+------+
#  |    a1|    a2|
#  +------+------+
#  |[2, 3]|[4, 5]|
#  |[1, 3]|[2, 4]|
#  +------+------+
Run Code Online (Sandbox Code Playgroud)

我知道如何将数组乘以标量。但是如何将一个数组的成员与另一个数组的相应成员相乘呢?

期望的结果:

#  +------+------+-------+
#  |    a1|    a2|    res|
#  +------+------+-------+
#  |[2, 3]|[4, 5]|[8, 15]|
#  |[1, 3]|[2, 4]|[2, 12]|
#  +------+------+-------+
Run Code Online (Sandbox Code Playgroud)

abi*_*sis 4

与您的示例类似,您可以从转换函数访问第二个数组。这假设两个数组具有相同的长度:

from pyspark.sql.functions import expr

cols = [  'a1',   'a2']
data = [([2, 3], [4, 5]),
        ([1, 3], [2, 4])]

df = spark.createDataFrame(data, cols)

df = df.withColumn("res", expr("transform(a1, (x, i) -> a2[i] * x)"))

# +------+------+-------+
# |    a1|    a2|    res|
# +------+------+-------+
# |[2, 3]|[4, 5]|[8, 15]|
# |[1, 3]|[2, 4]|[2, 12]|
# +------+------+-------+
Run Code Online (Sandbox Code Playgroud)