Zyg*_*ygD 5 arrays multiplication apache-spark apache-spark-sql pyspark
我有下表:
from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()
cols = [ 'a1', 'a2']
data = [([2, 3], [4, 5]),
([1, 3], [2, 4])]
df = spark.createDataFrame(data, cols)
df.show()
# +------+------+
# | a1| a2|
# +------+------+
# |[2, 3]|[4, 5]|
# |[1, 3]|[2, 4]|
# +------+------+
Run Code Online (Sandbox Code Playgroud)
我知道如何将数组乘以标量。但是如何将一个数组的成员与另一个数组的相应成员相乘呢?
期望的结果:
# +------+------+-------+
# | a1| a2| res|
# +------+------+-------+
# |[2, 3]|[4, 5]|[8, 15]|
# |[1, 3]|[2, 4]|[2, 12]|
# +------+------+-------+
Run Code Online (Sandbox Code Playgroud)
与您的示例类似,您可以从转换函数访问第二个数组。这假设两个数组具有相同的长度:
from pyspark.sql.functions import expr
cols = [ 'a1', 'a2']
data = [([2, 3], [4, 5]),
([1, 3], [2, 4])]
df = spark.createDataFrame(data, cols)
df = df.withColumn("res", expr("transform(a1, (x, i) -> a2[i] * x)"))
# +------+------+-------+
# | a1| a2| res|
# +------+------+-------+
# |[2, 3]|[4, 5]|[8, 15]|
# |[1, 3]|[2, 4]|[2, 12]|
# +------+------+-------+
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
975 次 |
最近记录: |