在spark中的substring中使用length函数

sat*_*ish 8 scala substring string-length dataframe apache-spark

我试图在一个子字符串函数中使用长度函数,DataFrame 但它给出了错误

val substrDF = testDF.withColumn("newcol", substring($"col", 1, length($"col")-1))
Run Code Online (Sandbox Code Playgroud)

以下是错误

 error: type mismatch;
 found   : org.apache.spark.sql.Column
 required: Int
Run Code Online (Sandbox Code Playgroud)

我正在使用2.1.

pas*_*701 18

可以使用函数"expr":

val data = List("first", "second", "third")
val df = sparkContext.parallelize(data).toDF("value")
val result = df.withColumn("cutted", expr("substring(value, 1, length(value)-1)"))
result.show(false)
Run Code Online (Sandbox Code Playgroud)

输出:

+------+------+
|value |cutted|
+------+------+
|first |firs  |
|second|secon |
|third |thir  |
+------+------+
Run Code Online (Sandbox Code Playgroud)


小智 10

你也可以使用$"COLUMN".SUBSTR

val substrDF = testDF.withColumn("newcol", $"col".substr(lit(1), length($"col")-1))
Run Code Online (Sandbox Code Playgroud)

输出:

val testDF = sc.parallelize(List("first", "second", "third")).toDF("col")
val result = testDF.withColumn("newcol", $"col".substr(org.apache.spark.sql.functions.lit(1), length($"col")-1))
result.show(false)
+------+------+
|col   |newcol|
+------+------+
|first |firs  |
|second|secon |
|third |thir  |
+------+------+
Run Code Online (Sandbox Code Playgroud)