相关疑难解决方法(0)

什么是在pyspark中列出不同数据帧列的正确方法?

我想在spark数据帧中总结不同的列.

from pyspark.sql import functions as F
cols = ["A.p1","B.p1"]
df = spark.createDataFrame([[1,2],[4,89],[12,60]],schema=cols)

# 1. Works
df = df.withColumn('sum1', sum([df[col] for col in ["`A.p1`","`B.p1`"]]))

#2. Doesnt work
df = df.withColumn('sum1', F.sum([df[col] for col in ["`A.p1`","`B.p1`"]]))

#3. Doesnt work
df = df.withColumn('sum1', sum(df.select(["`A.p1`","`B.p1`"])))
Run Code Online (Sandbox Code Playgroud)

为什么不接近#2..不工作?我在Spark 2.2上

python apache-spark apache-spark-sql pyspark pyspark-sql

4
推荐指数
1
解决办法
8822
查看次数