包含 pyspark SQL：类型错误：“列”对象不可调用

Question

包含 pyspark SQL：类型错误：“列”对象不可调用

Jee*_*van 5 python apache-spark apache-spark-sql pyspark

我正在使用火花 2.0.1，

 df.show()
+--------+------+---+-----+-----+----+
|Survived|Pclass|Sex|SibSp|Parch|Fare|
+--------+------+---+-----+-----+----+
|     0.0|   3.0|1.0|  1.0|  0.0| 7.3|
|     1.0|   1.0|0.0|  1.0|  0.0|71.3|
|     1.0|   3.0|0.0|  0.0|  0.0| 7.9|
|     1.0|   1.0|0.0|  1.0|  0.0|53.1|
|     0.0|   3.0|1.0|  0.0|  0.0| 8.1|
|     0.0|   3.0|1.0|  0.0|  0.0| 8.5|
|     0.0|   1.0|1.0|  0.0|  0.0|51.9|

Run Code Online (Sandbox Code Playgroud)

我有一个数据框，我想使用 withColumn 向 df 添加一个新列，新列的值基于其他列值。我使用了这样的东西：

>>> dfnew = df.withColumn('AddCol' , when(df.Pclass.contains('3.0'),'three').otherwise('notthree'))

Run Code Online (Sandbox Code Playgroud)

它给出了一个错误

TypeError: 'Column' object is not callable

Run Code Online (Sandbox Code Playgroud)

可以帮助如何克服这个错误。

Answer 1

Man*_*que 7

这是因为您正在尝试将该函数contains应用于该列。该函数contains在 pyspark 中不存在。你应该试试like。尝试这个：

import pyspark.sql.functions as F

df = df.withColumn("AddCol",F.when(F.col("Pclass").like("3"),"three").otherwise("notthree"))

Run Code Online (Sandbox Code Playgroud)

或者，如果您只是希望它恰好是3您应该做的数字：

import pyspark.sql.functions as F

# If the column Pclass is numeric
df = df.withColumn("AddCol",F.when(F.col("Pclass") == F.lit(3),"three").otherwise("notthree"))

# If the column Pclass is string
df = df.withColumn("AddCol",F.when(F.col("Pclass") == F.lit("3"),"three").otherwise("notthree"))

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，2 月前
查看次数：	27927 次
最近记录：	4 年，8 月前