dav*_*avr 191 mysql sql statistics median
使用MySQL计算中值的最简单(并且希望不是太慢)的方法是什么?我已经习惯AVG(x)了找到平均值,但我很难找到一种计算中位数的简单方法.现在,我将所有行返回给PHP,进行排序,然后选择中间行,但肯定必须有一些简单的方法在单个MySQL查询中执行此操作.
示例数据:
id | val
--------
1 4
2 7
3 2
4 2
5 9
6 8
7 3
Run Code Online (Sandbox Code Playgroud)
排序上val给出2 2 3 4 7 8 9的,所以中间应该是4,与SELECT AVG(val)这== 5.
vel*_*row 213
在MariaDB/MySQL中:
SELECT AVG(dd.val) as median_val
FROM (
SELECT d.val, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum
FROM data d, (SELECT @rownum:=0) r
WHERE d.val is NOT NULL
-- put some where clause here
ORDER BY d.val
) as dd
WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );
Run Code Online (Sandbox Code Playgroud)
Steve Cohen指出,在第一次传递后,@ runum将包含总行数.这可用于确定中位数,因此不需要第二次通过或连接.
此外AVG(dd.val),dd.row_number IN(...)当有偶数个记录时,用于正确生成中位数.推理:
SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3
Run Code Online (Sandbox Code Playgroud)
最后,MariaDB 10.3.3+包含一个MEDIAN函数
The*_*lor 58
我刚刚在评论中在网上找到了另一个答案:
对于几乎所有SQL中的中位数:
Run Code Online (Sandbox Code Playgroud)SELECT x.val from data x, data y GROUP BY x.val HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2
确保列的索引编制良好,索引用于过滤和排序.验证解释计划.
select count(*) from table --find the number of rows
Run Code Online (Sandbox Code Playgroud)
计算"中位数"行数.也许用:median_row = floor(count / 2).
然后从列表中选择它:
select val from table order by val asc limit median_row,1
Run Code Online (Sandbox Code Playgroud)
这应该只返回您想要的值的一行.
雅各
zoo*_*ron 30
我发现接受的解决方案不适用于我的MySQL安装,返回一个空集,但这个查询在我测试的所有情况下都适用于我:
SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5
LIMIT 1
Run Code Online (Sandbox Code Playgroud)
bob*_*bob 22
不幸的是,TheJacobTaylor和velcro的答案都没有为当前版本的MySQL返回准确的结果.
Velcro从上面得到的答案很接近,但是对于具有偶数行的结果集,它没有正确计算.中位数被定义为1)奇数集上的中间数,或2)偶数集上的两个中间数的平均值.
所以,这里的velcro解决方案修补了处理奇数和偶数集:
SELECT AVG(middle_values) AS 'median' FROM (
SELECT t1.median_column AS 'middle_values' FROM
(
SELECT @row:=@row+1 as `row`, x.median_column
FROM median_table AS x, (SELECT @row:=0) AS r
WHERE 1
-- put some where clause here
ORDER BY x.median_column
) AS t1,
(
SELECT COUNT(*) as 'count'
FROM median_table x
WHERE 1
-- put same where clause here
) AS t2
-- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
WHERE t1.row >= t2.count/2 and t1.row <= ((t2.count/2) +1)) AS t3;
Run Code Online (Sandbox Code Playgroud)
要使用它,请按照以下3个简单步骤操作:
Rob*_*ley 11
如果 MySQL 有 ROW_NUMBER,则 MEDIAN 为(受此 SQL Server 查询的启发):
WITH Numbered AS
(
SELECT *, COUNT(*) OVER () AS Cnt,
ROW_NUMBER() OVER (ORDER BY val) AS RowNum
FROM yourtable
)
SELECT id, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
;
Run Code Online (Sandbox Code Playgroud)
如果条目数为偶数,则使用 IN。
如果您想找到每组的中位数,那么只需在 OVER 子句中使用 PARTITION BY group 即可。
抢
小智 11
安装并使用这个mysql统计函数:http : //www.xarg.org/2012/07/statistical-functions-in-mysql/
之后,计算中位数很容易:
SELECT median(val) FROM data;
Run Code Online (Sandbox Code Playgroud)
小智 9
我提出了一个更快的方法.
获取行数:
SELECT CEIL(COUNT(*)/2) FROM data;
然后在排序的子查询中取中间值:
SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;
我使用随机数的5x10e6数据集对此进行了测试,它将在10秒内找到中位数.
MySQL文档中对此页面的评论有以下建议:
-- (mostly) High Performance scaling MEDIAN function per group
-- Median defined in http://en.wikipedia.org/wiki/Median
--
-- by Peter Hlavac
-- 06.11.2008
--
-- Example Table:
DROP table if exists table_median;
CREATE TABLE table_median (id INTEGER(11),val INTEGER(11));
COMMIT;
INSERT INTO table_median (id, val) VALUES
(1, 7), (1, 4), (1, 5), (1, 1), (1, 8), (1, 3), (1, 6),
(2, 4),
(3, 5), (3, 2),
(4, 5), (4, 12), (4, 1), (4, 7);
-- Calculating the MEDIAN
SELECT @a := 0;
SELECT
id,
AVG(val) AS MEDIAN
FROM (
SELECT
id,
val
FROM (
SELECT
-- Create an index n for every id
@a := (@a + 1) mod o.c AS shifted_n,
IF(@a mod o.c=0, o.c, @a) AS n,
o.id,
o.val,
-- the number of elements for every id
o.c
FROM (
SELECT
t_o.id,
val,
c
FROM
table_median t_o INNER JOIN
(SELECT
id,
COUNT(1) AS c
FROM
table_median
GROUP BY
id
) t2
ON (t2.id = t_o.id)
ORDER BY
t_o.id,val
) o
) a
WHERE
IF(
-- if there is an even number of elements
-- take the lower and the upper median
-- and use AVG(lower,upper)
c MOD 2 = 0,
n = c DIV 2 OR n = (c DIV 2)+1,
-- if its an odd number of elements
-- take the first if its only one element
-- or take the one in the middle
IF(
c = 1,
n = 1,
n = c DIV 2 + 1
)
)
) a
GROUP BY
id;
-- Explanation:
-- The Statement creates a helper table like
--
-- n id val count
-- ----------------
-- 1, 1, 1, 7
-- 2, 1, 3, 7
-- 3, 1, 4, 7
-- 4, 1, 5, 7
-- 5, 1, 6, 7
-- 6, 1, 7, 7
-- 7, 1, 8, 7
--
-- 1, 2, 4, 1
-- 1, 3, 2, 2
-- 2, 3, 5, 2
--
-- 1, 4, 1, 4
-- 2, 4, 5, 4
-- 3, 4, 7, 4
-- 4, 4, 12, 4
-- from there we can select the n-th element on the position: count div 2 + 1
Run Code Online (Sandbox Code Playgroud)
上面的大多数解决方案仅适用于表的一个字段,您可能需要获取查询中许多字段的中位数(第50个百分位数)。
我用这个:
SELECT CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(field_name ORDER BY field_name SEPARATOR ','),
',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `Median`
FROM table_name;
Run Code Online (Sandbox Code Playgroud)
您可以将上面示例中的“ 50”替换为任何百分位数,非常有效。
只要确保您有足够的内存来存储GROUP_CONCAT,就可以使用以下方法进行更改:
SET group_concat_max_len = 10485760; #10MB max length
Run Code Online (Sandbox Code Playgroud)
更多详细信息:http : //web.performancerasta.com/metrics-tips-calculating-95th-99th-or-any-percentile-with-single-mysql-query/
小智 6
我在 HackerRank 上找到了以下代码,它非常简单,适用于各种情况。
SELECT M.MEDIAN_COL FROM MEDIAN_TABLE M WHERE
(SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL < M.MEDIAN_COL ) =
(SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL > M.MEDIAN_COL );
Run Code Online (Sandbox Code Playgroud)
根据维可牢尼龙搭扣的答案,对于那些必须对由另一个参数分组的东西进行中值计算的人:
选择 grp_field, t1.val FROM (
选择 grp_field, @rownum:=IF(@s = grp_field, @rownum + 1, 0) AS row_number,
@s:=IF(@s = grp_field, @s, grp_field) AS sec, d.val
FROM 数据 d, (SELECT @rownum:=0, @s:=0) r
ORDER BY grp_field, d.val
) 作为 t1 加入 (
SELECT grp_field, count(*) 作为total_rows
FROM 数据 d
按 grp_field 分组
) 作为 t2
ON t1.grp_field = t2.grp_field
其中 t1.row_number=floor(total_rows/2)+1;