查找另一个表中存在的表中的行百分比?

Cra*_*aig 1 sql hive

我有以下表格:

Table A:

entryDate              memberID           course
Run Code Online (Sandbox Code Playgroud)

每个memberID可以在同一天发生多次

2016-05-10      1192875         STAT-2294

2016-05-10      3292875         STAT-2294

2016-05-10      1192875         ENG-115
Run Code Online (Sandbox Code Playgroud)

表B仅包含memberID

我正在寻找的是找到表A中给定日期表B中存在的成员ID百分比的方法.

这是我到目前为止的地方:

SELECT entryDate, 
       Count(CASE 
               WHEN tableA.memberID IN (SELECT memberID 
                                        FROM   tableB) THEN 1 
               ELSE 0 
             END) AS membership 
FROM   tableA 
WHERE  entryDate BETWEEN ‘2016-05-01’ AND ‘2016-05-15’ 
GROUP  BY entryDate; 
Run Code Online (Sandbox Code Playgroud)

我试图将原始计数作为起点,但我得到以下错误

不支持的子查询表达式'memberID':目前只允许SubQuery表达式作为Where子句谓词

  • 我当前的查询有什么问题?
  • 如何获取TableB中TableA中存在的特定entryDate的行百分比?

TIA!-Craig

Vam*_*ala 6

你可以exists用来做这件事.

select count(*) 
from tableA a
where exists (select 1 from tableB b where a.memberID = b.memberID)
and entryDate BETWEEN '20160501' AND '20160515'
Run Code Online (Sandbox Code Playgroud)

要获得%条目,

select 100.0 * count(*) / (select count(*) 
                           from tableA a
                           where exists (select 1 from tableB b where a.memberID = b.memberID)
                           and entryDate BETWEEN '20160501' AND '20160515')
from tableA 
where entryDate BETWEEN '20160501' AND '20160515'
Run Code Online (Sandbox Code Playgroud)

编辑:Hive不支持相关的子查询,这可以通过a来完成left join.

select 100.0 * count(b.memberID) / count(a.memberID)
from tableA a
left join tableB b on a.memberID = b.memberID and a.entryDate BETWEEN '20160501' AND '20160515'
Run Code Online (Sandbox Code Playgroud)