排除出现在CONNECT BY查询的另一列中的结果

Jam*_*ing 6 sql oracle hierarchical-data

有一个沉重的查询(运行需要15分钟),但它返回的结果超出了我的需要.这是一个CONNECT BY查询,我得到的节点是根节点结果中的后代.IE:

Ted
  Bob
    John
Bob
  John
John
Run Code Online (Sandbox Code Playgroud)

通常,解决此问题的方法是使用START WITH条件,通常要求节点的父节点为空.但由于查询的性质,我没有需要比较的START WITH值,直到我有完整的结果集.我基本上试图双重查询我的结果,说QUERY STUFF开始记录并不是那么容易.


这是查询(在Nicholas Krasnov的帮助下构建,这里: Oracle自连接多个可能的列匹配 - CONNECT BY?):

select cudroot.root_user, cudroot.node_level, cudroot.user_id, cudroot.new_user_id,
       cudbase.*  -- Not really, just simplyfing
from   css.user_desc cudbase
  join (select connect_by_root(user_id) root_user,   
               user_id                  user_id,        
               new_user_id              new_user_id,
               level                    node_level
        from   (select cudordered.user_id,      
                       coalesce(cudordered.new_user_id, cudordered.nextUser) new_user_id
                from   (select cud.user_id, 
                               cud.new_user_id, 
                               decode(cud.global_hr_id, null, null, lead(cud.user_id ignore nulls) over (partition by cud.global_hr_id order by cud.user_id)) nextUser
                        from   css.user_desc cud
                          left join gsu.stg_userdata gstgu
                            on (gstgu.user_id = cud.user_id 
                                or (gstgu.sap_asoc_global_id = cud.global_hr_id))
                        where  upper(cud.user_type_code) in ('EMPLOYEE','CONTRACTOR','DIV_EMPLOYEE','DIV_CONTRACTOR','DIV_MYTEAPPROVED')) cudordered)
        connect by nocycle user_id = prior new_user_id) cudroot
    on cudbase.user_id = cudroot.user_id
order by 
       cudroot.root_user, cudroot.node_level, cudroot.user_id;
Run Code Online (Sandbox Code Playgroud)


这给了我关于相关用户的结果(基于user_id重命名或关联的SAP ID),如下所示:

ROOT_ID     LEVEL   USER_ID         NEW_USER_ID
------------------------------------------------
A5093522    1       A5093522        FG096489
A5093522    2       FG096489        A5093665
A5093522    3       A5093665        
FG096489    1       FG096489        A5093665
FG096489    2       A5093665
A5093665    1       A5093665
Run Code Online (Sandbox Code Playgroud)

我需要的是一种过滤第一个join (select connect_by_root(user_id)...排除FG096489A5093665从根列表中排除的方法.


最好START WITH我能想到的是这样的(未测试):

start with user_id not in (select new_user_id 
                           from   (select coalesce(cudordered.new_user_id, cudordered.nextUser) new_user_id
                                   from   (select cud.new_user_id, 
                                                  decode(cud.global_hr_id, null, null, lead(cud.user_id ignore nulls) over (partition by cud.global_hr_id order by cud.user_id)) nextUser
                                           from   css.user_desc cud
                                           where  upper(cud.user_type_code) in ('EMPLOYEE','CONTRACTOR','DIV_EMPLOYEE','DIV_CONTRACTOR','DIV_MYTEAPPROVED')) cudordered)
                           connect by nocycle user_id = prior new_user_id)
Run Code Online (Sandbox Code Playgroud)

...但我有效地执行了15分钟的查询两次.

我看过在查询中使用分区,但实际上没有分区...我想查看new_user_ids的完整结果集.还探索了rank()等分析函数......我的技巧包是空的.

有任何想法吗?


澄清

我不希望根列表中的额外记录的原因是因为我只想为每个用户提供一组结果.IE,如果Bob Smith在他的职业生涯中有四个账户(人们经常出入,作为员工和/或承包商),我想使用一组账户,这些账户都属于Bob Smith.

如果Bob作为承包商来到这里,转换为员工,离开,作为另一个国家的承包商回来,并离开/返回现在在我们SAP系统中的合法组织,他的帐户重命名/链可能如下所示:

Bob Smith  CONTRACTOR   ----    US0T0001  ->  US001101  (given a new ID as an employee)
Bob Smith  EMPLOYEE     ----    US001101  ->  EB0T0001  (contractor ID for  the UK)
Bob Smith  CONTRACTOR  SAP001   EB0T000T                (no rename performed)
Bob Smith  EMPLOYEE    SAP001   TE110001                (currently-active ID)
Run Code Online (Sandbox Code Playgroud)

在上面的示例中,四个帐户通过new_user_id重命名用户时设置的字段或具有相同的SAP ID进行链接.

由于HR经常无法遵循业务流程,因此返回的用户最终可能会恢复这四个ID中的任何一个.我必须分析Bob Smith的所有ID,并说"Bob Smith只能恢复TE110001",并且如果他们尝试恢复其他内容,则会回复错误.我必须为90,000多条记录做到这一点.

第一列"Bob Smith"只是关联帐户组的标识符.在我的原始示例中,我使用root用户ID作为标识符(例如US0T0001).如果我使用名/姓来识别用户,我最终会遇到冲突.

所以鲍勃史密斯看起来像这样:

US0T0001  1  CONTRACTOR   ----    US0T0001  ->  US001101  (given a new ID as an employee)
US0T0001  2  EMPLOYEE     ----    US001101  ->  EB0T0001  (contractor ID for  the UK)
US0T0001  3  CONTRACTOR  SAP001   EB0T0001                (no rename performed)
US0T0001  4  EMPLOYEE    SAP001   TE110001                (currently-active ID)
Run Code Online (Sandbox Code Playgroud)

......其中1,2,3,4是层次结构中的等级.

由于US0T0001,US001101,EB0T0001和TE110001都被考虑在内,我不想要另外一组.但我现在的结果是将这些帐户列在多个组中:

US001101  1  EMPLOYEE     ----    US001101  ->  EB0T0001  (
US001101  2  CONTRACTOR  SAP001   EB0T0001                
US001101  3  EMPLOYEE    SAP001   TE110001               

EB0T0001  1  CONTRACTOR  SAP001   EB0T0001               
EB0T0001  2  EMPLOYEE    SAP001   TE110001                

US001101  1  EMPLOYEE    SAP001   TE110001                 
Run Code Online (Sandbox Code Playgroud)

这会导致两个问题:

  1. 当我查询用户ID的结果时,我会从多个组中获得点击
  2. 每个组都将报告Bob Smith的不同预期用户ID.


您要求扩展记录集......以下是一些实际数据:

-- NumRootUsers tells me how many accounts are associated with a user.
-- The new user ID field is explicitly set in the database, but may be null.
-- The calculated new user ID analyzes records to determine what the next related record is

          NumRoot                   New User    Calculated
RootUser  Users    Level  UserId    ID Field    New User ID   SapId       LastName        FirstName
-----------------------------------------------------------------------------------------------
BG100502  3        1      BG100502  BG1T0873    BG1T0873                  GRIENS VAN      KION
BG100502  3        2      BG1T0873  BG103443    BG103443                  GRIENS VAN      KION
BG100502  3        3      BG103443                            41008318    VAN GRIENS      KION

-- This group causes bad matches for Kion van Griens... the IDs are already accounted for,
-- and this group doesn't even grab all of the accounts for Kion.  It's also using a new 
-- ID to identify the group
BG1T0873  2        1      BG1T0873  BG103443    BG103443                  GRIENS VAN      KION
BG1T0873  2        2      BG103443                            41008318    VAN GRIENS      KION

-- Same here...
BG103443  1        1      BG103443                            41008318    VAN GRIENS      KION

-- Good group of records 
BG100506  3        1      BG100506              BG100778      41008640    MALEN VAN       LARS
BG100506  3        2      BG100778              BG1T0877      41008640    MALEN VAN       LARS
BG100506  3        3      BG1T0877                            41008640    VAN MALEN       LARS

-- Bad, unwanted group of records
BG100778  2        1      BG100778              BG1T0877      41008640    MALEN VAN       LARS
BG100778  2        2      BG1T0877                            41008640    VAN MALEN       LARS

-- Third group for Lars
BG1T0877  1        1      BG1T0877                            41008640    VAN MALEN       LARS


-- Jan... fields are set differently than the above examples, but the chain is calculated correctly
BG100525  3        1      BG100525              BG1T0894      41008651    ZANWIJK VAN     JAN
BG100525  3        2      BG1T0894  TE035165    TE035165      41008651    VAN ZANWIJK     JAN
BG100525  3        3      TE035165                            41008651    VAN ZANWIJK     JAN

-- Bad
BG1T0894  2        1      BG1T0894  TE035165    TE035165      41008651    VAN ZANWIJK     JAN
BG1T0894  2        2      TE035165                            41008651    VAN ZANWIJK     JAN

-- Bad bad
TE035165  1        1      TE035165                            41008651    VAN ZANWIJK     JAN


-- Somebody goofed and gave Ziano a second SAP ID... but we still matched correctly
BG100527  3        1      BG100527              BG1T0896      41008652    STEFANI DE      ZIANO
BG100527  3        2      BG1T0896  TE033030    TE033030      41008652    STEFANI DE      ZIANO
BG100527  3        3      TE033030                            42006172    DE STEFANI      ZIANO

-- And we still got extra, unwanted groups
BG1T0896  3        2      BG1T0896  TE033030    TE033030      41008652    STEFANI DE      ZIANO
BG1T0896  3        3      TE033030                            42006172    DE STEFANI      ZIANO

TE033030  3        3      TE033030                            42006172    DE STEFANI      ZIANO


-- Mark's a perfect example of the missing/frustrating data I'm dealing with... but we still matched correctly
BG102188  3        1      BG102188              BG1T0543      41008250    BULINS          MARK
BG102188  3        2      BG1T0543              TE908583      41008250    BULINS          R.J.M.A.
BG102188  3        3      TE908583                            41008250    BULINS          RICHARD JOHANNES MARTINUS ALPHISIUS

-- Not wanted
BG1T0543  3        2      BG1T0543              TE908583      41008250    BULINS          R.J.M.A.
BG1T0543  3        3      TE908583                            41008250    BULINS          RICHARD JOHANNES MARTINUS ALPHISIUS

TE908583  3        3      TE908583                            41008250    BULINS          RICHARD JOHANNES MARTINUS ALPHISIUS


-- One more for good measure
BG1T0146  3        1      BG1T0146  BG105905    BG105905                  LUIJENT         VALERIE
BG1T0146  3        2      BG105905              TE034165      42006121    LUIJENT         VALERIE
BG1T0146  3        3      TE034165                            42006121    LUIJENT         VALERIE

BG105905  3        2      BG105905              TE034165      42006121    LUIJENT         VALERIE
BG105905  3        3      TE034165                            42006121    LUIJENT         VALERIE

TE034165  3        3      TE034165                            42006121    LUIJENT         VALERIE
Run Code Online (Sandbox Code Playgroud)

不确定所有这些信息是否会使它更清晰或者会让你的眼睛回到你的脑海中:)

谢谢你看这个!

Hug*_*nes 1

我想我有。我们让自己专注于时间顺序,但事实上这并不重要。您的 START WITH 子句应为“NEW_USER_ID IS NULL”。

要获得时间顺序,您可以“ORDER BY cudroot.node_level * -1”。

我还建议您考虑使用WITH 子句来形成基础数据并对其执行分层查询。