Fey*_*n27 4 scala user-defined-functions apache-spark spark-dataframe
我正在尝试在Spark数据帧的单个列中替换":" - >"_"的所有实例.我正在尝试这样做:
val url_cleaner = (s:String) => {
s.replaceAll(":","_")
}
val url_cleaner_udf = udf(url_cleaner)
val df = old_df.withColumn("newCol", url_cleaner_udf(old_df("oldCol")) )
Run Code Online (Sandbox Code Playgroud)
但我一直收到错误:
SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 4 times, most recent failure: Lost task 0.3 in stage 25.0 (TID 692, ip-10-81-194-29.ec2.internal): java.lang.NullPointerException
Run Code Online (Sandbox Code Playgroud)
我在udf哪里出错了?
T. *_*ęda 13
可能你在这个专栏中有一些空值.
尝试:
val urlCleaner = (s:String) => {
if (s == null) null else s.replaceAll(":","_")
}
Run Code Online (Sandbox Code Playgroud)
您也可以使用regexp_replace(col("newCol"), ":", "_")而不是自己的功能
| 归档时间: |
|
| 查看次数: |
17513 次 |
| 最近记录: |