我有一个比这里的示例更复杂的查询,但是它只需要返回某个字段在数据集中不会出现多次的行.
ACTIVITY_SK STUDY_ACTIVITY_SK
100 200
101 201
102 200
100 203
Run Code Online (Sandbox Code Playgroud)
在此示例中,我不希望返回任何ACTIVITY_SK100的记录,因为ACTIVITY_SK在数据集中出现两次.
数据是映射表,并且在许多联接中使用,但是这样的多个记录意味着数据质量问题,因此我需要简单地从结果中删除它们,而不是在其他地方导致错误的连接.
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
Run Code Online (Sandbox Code Playgroud)
我尝试过这样的事情:
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
WHERE A.ACTIVITY_SK NOT IN
(
SELECT
A.ACTIVITY_SK,
COUNT(*)
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
GROUP BY A.ACTIVITY_SK
HAVING COUNT(*) > 1
)
Run Code Online (Sandbox Code Playgroud)
但必须有一个较便宜的方式来做到这一点......