这是一个错误,还是 Snowflake 不完全支持 WHERE EXISTS 子句中的相关子查询?

Nic*_* L. 10 sql correlated-subquery snowflake-cloud-data-platform

如果过滤条件依赖于外部表和子查询表中的合并列,则 Snowflake 会针对 EXISTS 子句引发错误。如果我从 COALESCE 中删除外部表列或将 COALESCE 替换为长格式等效逻辑,则查询将运行。

我看到这个错误,特别是SQL 编译错误: Unsupported subquery type cannot bevaluated,我认为这是一个相当简单的 WHERE EXISTS 子句。这适用于我使用过的每个(最近的)SQL 变体(例如 SQL Server、Postgres),所以我有点担心 Snowflake 不支持它。我错过了什么吗?

我在 2019 年的Snowflake 社区中发现了一个类似的问题,当 EXISTS 子句包含一个 WHERE 过滤条件,该条件引用外部查询中的列以进行连接表以外的操作时,Snowflake 会失败。那里没有明确的解决方案。

Snowflake关于其对子查询的有限支持的文档表示,它支持“WHERE 子句中的 EXISTS、ANY / ALL 和 IN 子查询”的相关和不相关子查询。

那么为什么 EXISTS 子句会失败呢?我看到的是一个错误,还是这是一个没有明确记录的 Snowflake 限制?

重现问题的代码:

CREATE OR REPLACE TEMPORARY TABLE Employee (
   Emp_SK INT NOT NULL
);

CREATE OR REPLACE TEMPORARY TABLE Employee_X_Pay_Rate (
   Emp_SK INT NOT NULL, Pay_Rate_SK INT NOT NULL, Start_Date TIMESTAMP_NTZ NOT NULL, End_Date TIMESTAMP_NTZ NOT NULL
);

CREATE OR REPLACE TEMPORARY TABLE Employee_X_Location (
   Emp_SK INT NOT NULL, Location_SK INT NOT NULL, Start_Date TIMESTAMP_NTZ NOT NULL, End_Date TIMESTAMP_NTZ NULL
);
INSERT INTO Employee
VALUES (1);

INSERT INTO Employee_X_Pay_Rate 
VALUES 
    (1, 1, '2018-01-01', '2019-03-31')
   ,(1, 2, '2019-04-01', '2021-03-31')
   ,(1, 3, '2021-04-01', '2099-12-31')
;

INSERT INTO Employee_X_Location
VALUES
    (1, 101, '2018-01-01', '2019-12-31')
   ,(1, 102, '2020-01-01', '2020-12-31')
   ,(1, 103, '2021-01-01', NULL)
;
SET Asof_Date = TO_DATE('2021-05-31', 'yyyy-mm-dd'); -- changing this to TO_TIMESTAMP makes no difference
SELECT 
   emp.Emp_SK
   ,empPay.Pay_Rate_SK
   ,$Asof_Date AS Report_Date
   ,empPay.Start_Date AS Pay_Start_Date
   ,empPay.End_Date AS Pay_End_Date
FROM Employee emp
   INNER JOIN Employee_X_Pay_Rate empPay
      ON emp.Emp_SK = empPay.Emp_SK
      AND $Asof_Date BETWEEN empPay.Start_Date AND empPay.End_Date
WHERE EXISTS (
   SELECT 1 FROM Employee_X_Location empLoc
   WHERE emp.Emp_SK = empLoc.Emp_SK
      -- Issue: Next line fails. empLoc.End_Date can be null
      AND $Asof_Date BETWEEN empLoc.Start_Date AND COALESCE(empLoc.End_Date, empPay.End_Date)
);

Run Code Online (Sandbox Code Playgroud)

如果我用以下任一内容替换问题行,查询将运行。

-- Workaround 1
AND (
   $Asof_Date >= empLoc.Start_Date
   AND ($Asof_Date <= empLoc.End_Date OR (empLoc.End_Date IS NULL AND $Asof_Date <= empPay.End_Date))
)

-- Workaround 2
AND $Asof_Date BETWEEN empLoc.Start_Date AND COALESCE(empLoc.End_Date, CURRENT_DATE)
Run Code Online (Sandbox Code Playgroud)

Sim*_*rim 2

我看到这种情况仍然发生,我只是注意到您已经知道交换empPay.End_Date到 CURRENT_DATE 这就是我的编写方式。

它确实使相关子查询变得更加复杂,因为现在您混合了两个表而不是一个表。

当使用 CURRENT_DATE 时,SQL 与以下内容相同:

SELECT 
    s.emp_sk
    ,ep.pay_rate_sk
    ,TO_DATE('2021-05-31') AS report_date
    ,ep.start_date AS pay_start_date
    ,ep.end_date AS pay_end_date
FROM (
    SELECT 
        e.emp_sk
    FROM employee e
    WHERE EXISTS (
        SELECT 1 
        FROM employee_x_location AS el
        WHERE e.emp_sk = el.emp_sk
            AND TO_DATE('2021-05-31') BETWEEN el.start_date AND COALESCE(el.end_date, CURRENT_DATE)
    )
) AS s
JOIN employee_x_pay_rate AS ep
    ON s.emp_sk = ep.emp_sk
        AND TO_DATE('2021-05-31') BETWEEN ep.start_date AND ep.end_date;
Run Code Online (Sandbox Code Playgroud)

因此,可以通过在子选择中交换employee表来显示复杂与简单的相关性,如下所示:employee_x_pay_rate

SELECT 
    e.emp_sk
FROM Employee_X_Pay_Rate e
WHERE EXISTS (
    SELECT 1 
    FROM employee_x_location AS el
    WHERE e.emp_sk = el.emp_sk
        AND TO_DATE('2021-05-31') BETWEEN el.start_date AND COALESCE(el.end_date, CURRENT_DATE)
)
Run Code Online (Sandbox Code Playgroud)

有效,但使用该表中的值不起作用:

SELECT 
    e.emp_sk
FROM Employee_X_Pay_Rate e
WHERE EXISTS (
    SELECT 1 
    FROM employee_x_location AS el
    WHERE e.emp_sk = el.emp_sk
        AND TO_DATE('2021-05-31') BETWEEN el.start_date AND COALESCE(el.end_date, e.End_Date)
)
Run Code Online (Sandbox Code Playgroud)

标志IFNULL(el.end_date, e.End_Date)并且NVL(el.end_date, e.End_Date)都失败了。

但是您可以重构代码以将 COALESCE 移至 CTE,然后使用 WHERE EXISTS,如下所示:

WITH r_emp_pay AS (
    SELECT 
       empPay.Emp_SK
       ,empPay.Pay_Rate_SK
       ,empPay.Start_Date 
       ,empPay.End_Date
    FROM Employee_X_Pay_Rate AS empPay
    WHERE TO_DATE('2021-05-31', 'yyyy-mm-dd') BETWEEN empPay.Start_Date AND empPay.End_Date
), r_emp_loc AS (
    SELECT 
        empLoc.Emp_SK
        ,empLoc.Start_Date
        ,empLoc.End_Date
        ,COALESCE(empLoc.End_Date, empPay.End_Date) as col_end_date
    FROM Employee_X_Location empLoc
    JOIN r_emp_pay empPay
        ON empPay.Emp_SK = empLoc.Emp_SK
    WHERE TO_DATE('2021-05-31', 'yyyy-mm-dd') BETWEEN empLoc.Start_Date AND COALESCE(empLoc.End_Date, CURRENT_DATE)
)
SELECT 
   emp.Emp_SK
   ,empPay.Pay_Rate_SK
   ,TO_DATE('2021-05-31', 'yyyy-mm-dd') AS Report_Date
   ,empPay.Start_Date AS Pay_Start_Date
   ,empPay.End_Date AS Pay_End_Date
FROM Employee emp
JOIN r_emp_pay empPay
    ON emp.Emp_SK = empPay.Emp_SK
WHERE EXISTS (
   SELECT 1 FROM r_emp_loc empLoc
   WHERE emp.Emp_SK = empLoc.Emp_SK
      AND TO_DATE('2021-05-31', 'yyyy-mm-dd') BETWEEN empLoc.Start_Date AND empLoc.col_end_date
);
Run Code Online (Sandbox Code Playgroud)

给出:

电磁脉冲_SK PAY_RATE_SK 报告日期 PAY_START_DATE PAY_END_DATE
1 3 2021-05-31 2021-04-01 00:00:00.000 2099-12-31 00:00:00.000