Nic*_* L. 10 sql correlated-subquery snowflake-cloud-data-platform
如果过滤条件依赖于外部表和子查询表中的合并列,则 Snowflake 会针对 EXISTS 子句引发错误。如果我从 COALESCE 中删除外部表列或将 COALESCE 替换为长格式等效逻辑,则查询将运行。
我看到这个错误,特别是SQL 编译错误: Unsupported subquery type cannot bevaluated,我认为这是一个相当简单的 WHERE EXISTS 子句。这适用于我使用过的每个(最近的)SQL 变体(例如 SQL Server、Postgres),所以我有点担心 Snowflake 不支持它。我错过了什么吗?
我在 2019 年的Snowflake 社区中发现了一个类似的问题,当 EXISTS 子句包含一个 WHERE 过滤条件,该条件引用外部查询中的列以进行连接表以外的操作时,Snowflake 会失败。那里没有明确的解决方案。
Snowflake关于其对子查询的有限支持的文档表示,它支持“WHERE 子句中的 EXISTS、ANY / ALL 和 IN 子查询”的相关和不相关子查询。
那么为什么 EXISTS 子句会失败呢?我看到的是一个错误,还是这是一个没有明确记录的 Snowflake 限制?
重现问题的代码:
CREATE OR REPLACE TEMPORARY TABLE Employee (
Emp_SK INT NOT NULL
);
CREATE OR REPLACE TEMPORARY TABLE Employee_X_Pay_Rate (
Emp_SK INT NOT NULL, Pay_Rate_SK INT NOT NULL, Start_Date TIMESTAMP_NTZ NOT NULL, End_Date TIMESTAMP_NTZ NOT NULL
);
CREATE OR REPLACE TEMPORARY TABLE Employee_X_Location (
Emp_SK INT NOT NULL, Location_SK INT NOT NULL, Start_Date TIMESTAMP_NTZ NOT NULL, End_Date TIMESTAMP_NTZ NULL
);
INSERT INTO Employee
VALUES (1);
INSERT INTO Employee_X_Pay_Rate
VALUES
(1, 1, '2018-01-01', '2019-03-31')
,(1, 2, '2019-04-01', '2021-03-31')
,(1, 3, '2021-04-01', '2099-12-31')
;
INSERT INTO Employee_X_Location
VALUES
(1, 101, '2018-01-01', '2019-12-31')
,(1, 102, '2020-01-01', '2020-12-31')
,(1, 103, '2021-01-01', NULL)
;
SET Asof_Date = TO_DATE('2021-05-31', 'yyyy-mm-dd'); -- changing this to TO_TIMESTAMP makes no difference
SELECT
emp.Emp_SK
,empPay.Pay_Rate_SK
,$Asof_Date AS Report_Date
,empPay.Start_Date AS Pay_Start_Date
,empPay.End_Date AS Pay_End_Date
FROM Employee emp
INNER JOIN Employee_X_Pay_Rate empPay
ON emp.Emp_SK = empPay.Emp_SK
AND $Asof_Date BETWEEN empPay.Start_Date AND empPay.End_Date
WHERE EXISTS (
SELECT 1 FROM Employee_X_Location empLoc
WHERE emp.Emp_SK = empLoc.Emp_SK
-- Issue: Next line fails. empLoc.End_Date can be null
AND $Asof_Date BETWEEN empLoc.Start_Date AND COALESCE(empLoc.End_Date, empPay.End_Date)
);
Run Code Online (Sandbox Code Playgroud)
如果我用以下任一内容替换问题行,查询将运行。
-- Workaround 1
AND (
$Asof_Date >= empLoc.Start_Date
AND ($Asof_Date <= empLoc.End_Date OR (empLoc.End_Date IS NULL AND $Asof_Date <= empPay.End_Date))
)
-- Workaround 2
AND $Asof_Date BETWEEN empLoc.Start_Date AND COALESCE(empLoc.End_Date, CURRENT_DATE)
Run Code Online (Sandbox Code Playgroud)
我看到这种情况仍然发生,我只是注意到您已经知道交换empPay.End_Date
到 CURRENT_DATE 这就是我的编写方式。
它确实使相关子查询变得更加复杂,因为现在您混合了两个表而不是一个表。
当使用 CURRENT_DATE 时,SQL 与以下内容相同:
SELECT
s.emp_sk
,ep.pay_rate_sk
,TO_DATE('2021-05-31') AS report_date
,ep.start_date AS pay_start_date
,ep.end_date AS pay_end_date
FROM (
SELECT
e.emp_sk
FROM employee e
WHERE EXISTS (
SELECT 1
FROM employee_x_location AS el
WHERE e.emp_sk = el.emp_sk
AND TO_DATE('2021-05-31') BETWEEN el.start_date AND COALESCE(el.end_date, CURRENT_DATE)
)
) AS s
JOIN employee_x_pay_rate AS ep
ON s.emp_sk = ep.emp_sk
AND TO_DATE('2021-05-31') BETWEEN ep.start_date AND ep.end_date;
Run Code Online (Sandbox Code Playgroud)
因此,可以通过在子选择中交换employee
表来显示复杂与简单的相关性,如下所示:employee_x_pay_rate
SELECT
e.emp_sk
FROM Employee_X_Pay_Rate e
WHERE EXISTS (
SELECT 1
FROM employee_x_location AS el
WHERE e.emp_sk = el.emp_sk
AND TO_DATE('2021-05-31') BETWEEN el.start_date AND COALESCE(el.end_date, CURRENT_DATE)
)
Run Code Online (Sandbox Code Playgroud)
有效,但使用该表中的值不起作用:
SELECT
e.emp_sk
FROM Employee_X_Pay_Rate e
WHERE EXISTS (
SELECT 1
FROM employee_x_location AS el
WHERE e.emp_sk = el.emp_sk
AND TO_DATE('2021-05-31') BETWEEN el.start_date AND COALESCE(el.end_date, e.End_Date)
)
Run Code Online (Sandbox Code Playgroud)
标志IFNULL(el.end_date, e.End_Date)
并且NVL(el.end_date, e.End_Date)
都失败了。
但是您可以重构代码以将 COALESCE 移至 CTE,然后使用 WHERE EXISTS,如下所示:
WITH r_emp_pay AS (
SELECT
empPay.Emp_SK
,empPay.Pay_Rate_SK
,empPay.Start_Date
,empPay.End_Date
FROM Employee_X_Pay_Rate AS empPay
WHERE TO_DATE('2021-05-31', 'yyyy-mm-dd') BETWEEN empPay.Start_Date AND empPay.End_Date
), r_emp_loc AS (
SELECT
empLoc.Emp_SK
,empLoc.Start_Date
,empLoc.End_Date
,COALESCE(empLoc.End_Date, empPay.End_Date) as col_end_date
FROM Employee_X_Location empLoc
JOIN r_emp_pay empPay
ON empPay.Emp_SK = empLoc.Emp_SK
WHERE TO_DATE('2021-05-31', 'yyyy-mm-dd') BETWEEN empLoc.Start_Date AND COALESCE(empLoc.End_Date, CURRENT_DATE)
)
SELECT
emp.Emp_SK
,empPay.Pay_Rate_SK
,TO_DATE('2021-05-31', 'yyyy-mm-dd') AS Report_Date
,empPay.Start_Date AS Pay_Start_Date
,empPay.End_Date AS Pay_End_Date
FROM Employee emp
JOIN r_emp_pay empPay
ON emp.Emp_SK = empPay.Emp_SK
WHERE EXISTS (
SELECT 1 FROM r_emp_loc empLoc
WHERE emp.Emp_SK = empLoc.Emp_SK
AND TO_DATE('2021-05-31', 'yyyy-mm-dd') BETWEEN empLoc.Start_Date AND empLoc.col_end_date
);
Run Code Online (Sandbox Code Playgroud)
给出:
电磁脉冲_SK | PAY_RATE_SK | 报告日期 | PAY_START_DATE | PAY_END_DATE |
---|---|---|---|---|
1 | 3 | 2021-05-31 | 2021-04-01 00:00:00.000 | 2099-12-31 00:00:00.000 |