如何在SPARK SQL中使用LEFT和RIGHT关键字

Mir*_*han 8 scala apache-spark apache-spark-sql

我是新手来激发SQL,

在MS SQL中,我们有LEFT关键字LEFT(Columnname,1) in('D','A') then 1 else 0.

如何在SPARK SQL中实现相同的功能.请指导我

use*_*411 11

您可以使用substring带有正数的函数pos从左侧:

import org.apache.spark.sql.functions.substring

substring(column, 0, 1)
Run Code Online (Sandbox Code Playgroud)

和消极pos从右边走:

substring(column, -1, 1)
Run Code Online (Sandbox Code Playgroud)

所以在Scala中你可以定义

import org.apache.spark.sql.Column
import org.apache.spark.sql.functions.substring

def left(col: Column, n: Int) = {
  assert(n >= 0)
  substring(col, 0, n)
}

def right(col: Column, n: Int) = {
  assert(n >= 0)
  substring(col, -n, n)
}

val df = Seq("foobar").toDF("str")

df.select(
  Seq(left _, right _).flatMap(f => (1 to 3).map(i => f($"str", i))): _*
).show
Run Code Online (Sandbox Code Playgroud)
+--------------------+--------------------+--------------------+---------------------+---------------------+---------------------+
|substring(str, 0, 1)|substring(str, 0, 2)|substring(str, 0, 3)|substring(str, -1, 1)|substring(str, -2, 2)|substring(str, -3, 3)|
+--------------------+--------------------+--------------------+---------------------+---------------------+---------------------+
|                   f|                  fo|                 foo|                    r|                   ar|                  bar|
+--------------------+--------------------+--------------------+---------------------+---------------------+---------------------+
Run Code Online (Sandbox Code Playgroud)

同样在Python中:

from pyspark.sql.functions import substring
from pyspark.sql.column import Column

def left(col, n):
    assert isinstance(col, (Column, str))
    assert isinstance(n, int) and n >= 0
    return substring(col, 0, n)

def right(col, n):
    assert isinstance(col, (Column, str))
    assert isinstance(n, int) and n >= 0
    return substring(col, -n, n)
Run Code Online (Sandbox Code Playgroud)


Nag*_*han 5

import org.apache.spark.sql.functions._  
Run Code Online (Sandbox Code Playgroud)

使用substring(column, 0, 1)代替LEFT功能。

在哪里

  • 0 : 字符串中的起始位置
  • 1 : 要选择的字符数

示例:考虑一个 LEFT 函数:

LEFT(upper(SKU),2)
Run Code Online (Sandbox Code Playgroud)

相应的 SparkSQL 语句将是:

substring(upper(SKU),1,2) 
Run Code Online (Sandbox Code Playgroud)