st1*_*led 1 python lambda user-defined-functions dataframe pyspark
我试图通过使用UDF替换spark数据帧中的某些值,但继续得到相同的错误.
调试时我发现它并不依赖于我正在使用的数据帧,也不依赖于我编写的函数.这是一个MWE,它具有一个简单的lambda函数,我无法正常执行.这应该基本上通过将值与自身连接来修改第一列中的所有值.
l = [('Alice', 1)]
df = sqlContext.createDataFrame(l)
df.show()
#+-----+---+
#| _1| _2|
#+-----+---+
#|Alice| 1|
#+-----+---+
df = df.withColumn("_1", udf(lambda x : lit(x+x), StringType())(df["_1"]))
df.show()
#Alice should now become AliceAlice
Run Code Online (Sandbox Code Playgroud)
这是我得到的错误,提到了一个相当神秘的"AttributeError:'NoneType'对象没有属性'_jvm'.
File "/cdh/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/python/pyspark/worker.py", line 111, in main
process()
File "/cdh/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/python/pyspark/worker.py", line 106, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/cdh/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/python/pyspark/serializers.py", line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/cdh/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/python/pyspark/sql/functions.py", line 1566, in <lambda>
func = lambda _, it: map(lambda x: returnType.toInternal(f(*x)), it)
File "<stdin>", line 1, in <lambda>
File "/cdh/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/python/pyspark/sql/functions.py", line 39, in _
jc = getattr(sc._jvm.functions, name)(col._jc if isinstance(col, Column) else col)
AttributeError: 'NoneType' object has no attribute '_jvm'
Run Code Online (Sandbox Code Playgroud)
我确信我对语法感到困惑,并且无法正确获取类型(感谢鸭子打字!),但我发现的每个withColumn和lambda函数的例子似乎与此类似.
你非常接近,它是抱怨因为你不能lit在udf :) lit中使用在列级别上使用,而不是在行级别上.
l = [('Alice', 1)]
df = spark.createDataFrame(l)
df.show()
+-----+---+
| _1| _2|
+-----+---+
|Alice| 1|
+-----+---+
df = df.withColumn("_1", udf(lambda x: x+x, StringType())("_1"))
# this would produce the same result, but lit is not necessary here
# df = df.withColumn("_1", udf(lambda x: x+x, StringType()(lit(df["_1"])))
df.show()
+----------+---+
| _1| _2|
+----------+---+
|AliceAlice| 1|
+----------+---+
Run Code Online (Sandbox Code Playgroud)