断言错误:所有表达式都应该是列

Din*_*ius 4 python apache-spark pyspark

我加入了两个 PySpark DataFrames,如下所示:

exprs = [max(x) for x in ["col1","col2"]]
df = df1.union(df2).groupBy(['campk', 'ppk']).agg(*exprs)
Run Code Online (Sandbox Code Playgroud)

但我收到此错误:

AssertionError: all exprs should be Column
Run Code Online (Sandbox Code Playgroud)

怎么了?

phi*_*ert 7

exprs = [max(x) for x in ["col1","col2"]]
Run Code Online (Sandbox Code Playgroud)

将返回具有最大 ASCII 值的字符,即 ['o', 'o']

引用正确的max会起作用:

>>> from pyspark.sql import functions as F
>>> exprs = [F.max(x) for x in ["col1","col2"]]
>>> print(exprs)
[Column<max(col1)>, Column<max(col2)>]
Run Code Online (Sandbox Code Playgroud)