列数= A的行的平均值,在第三列分组的另一列上的不同行中

Question

列数= A的行的平均值,在第三列分组的另一列上的不同行中

使用SQL Server,我试图从我没有设计的表中查询一种平均计数,其中我基本上想要一个列表,按一列分组,另一列的不同值的数量与给定标准匹配,和这些的,匹配另一准则(我将用它来创建的平均计数或不管它是什么)的行数.这可不难,但我的理论日设定不好,任何指针都会感激不尽.

这是简化和通用化的场景(下面的架构和示例数据).假设我们有三列:

objid (有聚集索引)
userid (没有索引,我可以添加一个)
actiontype (没有索引,我可以添加一个)

这些都不是唯一的,没有一个是独一无二的null.我们要完全忽略其中任何行actiontype为none.我们想知道,每个用户与之交互的对象平均有userid多少actiontype = 'flag'行.

因此,如果我们有"艾哈迈德","乔"和"玛丽亚",并且乔与3个物体互动并举起5个旗帜,则数字是5 / 3 = 1.6666连续的; 如果"艾哈迈德"与3个物体互动并且没有举起任何旗帜,那么他的号码就是0; 如果"玛丽亚"与5个物体互动并举起4个旗帜,她的号码将为4 / 5 = 0.8:

+--------+------------------+
| userid | flags_per_object |
+--------+------------------+
| ahmed  | 0                |
| joe    | 1.66666667       |
| maria  | 0.8              |
+--------+------------------+

如果将其作为副本关闭,我不会感到惊讶,我只是没有找到它.

这是简化的表格设置和示例数据:

create table tmp
(
    objid      varchar(254) not null,
    userid     varchar(254) not null,
    actiontype varchar(254) not null
)
create clustered index tmp_objid on tmp(objid)

insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'none')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'none')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'update')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'close')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'flag')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'flag')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'flag')
insert into tmp (objid, userid, actiontype) values ('alpha', 'joe', 'flag')

insert into tmp (objid, userid, actiontype) values ('beta', 'joe', 'none')
insert into tmp (objid, userid, actiontype) values ('beta', 'joe', 'none')
insert into tmp (objid, userid, actiontype) values ('beta', 'joe', 'close')
insert into tmp (objid, userid, actiontype) values ('beta', 'joe', 'flag')

insert into tmp (objid, userid, actiontype) values ('gamma', 'joe', 'none')

insert into tmp (objid, userid, actiontype) values ('delta', 'joe', 'update')

insert into tmp (objid, userid, actiontype) values ('alpha', 'maria', 'update')

insert into tmp (objid, userid, actiontype) values ('beta', 'maria', 'flag')
insert into tmp (objid, userid, actiontype) values ('beta', 'maria', 'flag')

insert into tmp (objid, userid, actiontype) values ('gamma', 'maria', 'flag')
insert into tmp (objid, userid, actiontype) values ('gamma', 'maria', 'flag')
insert into tmp (objid, userid, actiontype) values ('gamma', 'maria', 'update')
insert into tmp (objid, userid, actiontype) values ('gamma', 'maria', 'close')

insert into tmp (objid, userid, actiontype) values ('delta', 'maria', 'update')
insert into tmp (objid, userid, actiontype) values ('epsilon', 'maria', 'update')

insert into tmp (objid, userid, actiontype) values ('alpha', 'ahmed', 'none')

insert into tmp (objid, userid, actiontype) values ('beta', 'ahmed', 'none')

insert into tmp (objid, userid, actiontype) values ('gamma', 'ahmed', 'none')
insert into tmp (objid, userid, actiontype) values ('gamma', 'ahmed', 'update')

insert into tmp (objid, userid, actiontype) values ('delta', 'ahmed', 'update')
insert into tmp (objid, userid, actiontype) values ('delta', 'ahmed', 'close')

insert into tmp (objid, userid, actiontype) values ('epsilon', 'ahmed', 'update')
insert into tmp (objid, userid, actiontype) values ('epsilon', 'ahmed', 'close')

Run Code Online (Sandbox Code Playgroud)

Answer 1

a1e*_*x07 5

您可以尝试以下方法:

select  t1.userid,
CASE cnt2 
WHEN 0 THEN 0
ELSE ISNULL(cast(cnt2 as float)/cnt1,0)
END as num
FROM
(
  select userid, COUNT(distinct(t1.objid)) as cnt2
  from tmp as t1
  where t1.actiontype <> 'none'
  group by t1.userid
) t1

LEFT JOIN (
SELECT t2.userid, COUNT(*) as cnt1
FROM tmp as t2
WHERE t2.actiontype='flag'
GROUP BY t2.userid)b ON (b.userid = t1.userid)

Run Code Online (Sandbox Code Playgroud)

尽管它看起来比您的解决方案更糟糕,但令人惊讶的是,它会根据您提供的测试数据生成更好的执行计划.

Answer 2

T.J*_*der 1

答案是：这要看情况。

在我的测试中，无论我使用什么测试数据，我的解决方案都是最慢的。对于现实生活中的数据，它的速度大约是最快解决方案的一半。

Mikael 的解决方案对于我的问题中引用的测试数据更快，对于我的现实表中较大但仍然较小的数据集（我们的测试系统，大约 2k 行）更快。

但a1ex07 的解决方案对于我的全尺寸现实表（我们的实时系统，大约 700k 行）来说更快。a1ex07和Mikael之间的差距并不大，但a1ex07绝对有优势。

不过，我最终实际上使用了 Mikael 的解决方案，因为如果您不是 l33t DB 人员（并且维护此代码的人员（其中 SQL 只是一小部分），则更容易概念化）并且更容易适应其他各种场景。

因此，这个社区维基元答案，当时间限制过去时我会接受它，而不是接受他们任何一个优秀的答案。如果您发现这有帮助，请像我一样对Mikael 的答案和a1ex07 的答案进行投票。

归档时间：	14 年，6 月前
查看次数：	949 次
最近记录：	14 年，6 月前