Mic*_*Sim 65 sql complexity-theory logic reduction
我正在寻找一些"推理规则"(类似于设置操作规则或逻辑规则),我可以用它来减少复杂或大小的SQL查询.有没有这样的东西?任何文件,任何工具?您自己找到的任何等价物?它在某种程度上类似于查询优化,但不是在性能方面.
说明不同:使用JOIN,SUBSELECTs,UNIONs(复杂)查询是否可以(或不)通过使用某些转换规则将其减少为更简单的等效SQL语句,从而产生相同的结果?
因此,我正在寻找SQL语句的等效转换,例如大多数SUBSELECT可以重写为JOIN的事实.
Qua*_*noi 61
说明不同:使用JOIN,SUBSELECTs,UNIONs(复杂)查询是否可以(或不)通过使用某些转换规则将其减少为更简单的等效SQL语句,从而产生相同的结果?
这正是优化者为生活所做的事情(而不是我说他们总能这么做).
由于SQL是基于集合的语言,因此通常有多种方法可以将一个查询转换为其他查询.
喜欢这个查询:
SELECT *
FROM mytable
WHERE col1 > @value1 OR col2 < @value2
Run Code Online (Sandbox Code Playgroud)
可以转化为这个:
SELECT *
FROM mytable
WHERE col1 > @value1
UNION
SELECT *
FROM mytable
WHERE col2 < @value2
Run Code Online (Sandbox Code Playgroud)
或这个:
SELECT mo.*
FROM (
SELECT id
FROM mytable
WHERE col1 > @value1
UNION
SELECT id
FROM mytable
WHERE col2 < @value2
) mi
JOIN mytable mo
ON mo.id = mi.id
Run Code Online (Sandbox Code Playgroud)
,看起来更丑,但可以产生更好的执行计划.
最常见的事情之一是替换此查询:
SELECT *
FROM mytable
WHERE col IN
(
SELECT othercol
FROM othertable
)
Run Code Online (Sandbox Code Playgroud)
这一个:
SELECT *
FROM mytable mo
WHERE EXISTS
(
SELECT NULL
FROM othertable o
WHERE o.othercol = mo.col
)
Run Code Online (Sandbox Code Playgroud)
在某些RDBMS人(比如PostgreSQL)中,DISTINCT并GROUP BY使用不同的执行计划,所以有时候用另一个替换一个更好:
SELECT mo.grouper,
(
SELECT SUM(col)
FROM mytable mi
WHERE mi.grouper = mo.grouper
)
FROM (
SELECT DISTINCT grouper
FROM mytable
) mo
Run Code Online (Sandbox Code Playgroud)
与
SELECT mo.grouper, SUM(col)
FROM mytable
GROUP BY
mo.grouper
Run Code Online (Sandbox Code Playgroud)
在PostgreSQL,DISTINCT分类和GROUP BY散列.
MySQL缺乏FULL OUTER JOIN,所以它可以改写为folloing:
SELECT t1.col1, t2.col2
FROM table1 t1
LEFT OUTER JOIN
table2 t2
ON t1.id = t2.id
Run Code Online (Sandbox Code Playgroud)
与
SELECT t1.col1, t2.col2
FROM table1 t1
LEFT JOIN
table2 t2
ON t1.id = t2.id
UNION ALL
SELECT NULL, t2.col2
FROM table1 t1
RIGHT JOIN
table2 t2
ON t1.id = t2.id
WHERE t1.id IS NULL
Run Code Online (Sandbox Code Playgroud)
,但请参阅我的博客中有关如何更有效地执行此操作的文章MySQL:
此分层查询Oracle:
SELECT DISTINCT(animal_id) AS animal_id
FROM animal
START WITH
animal_id = :id
CONNECT BY
PRIOR animal_id IN (father, mother)
ORDER BY
animal_id
Run Code Online (Sandbox Code Playgroud)
可以转化为:
SELECT DISTINCT(animal_id) AS animal_id
FROM (
SELECT 0 AS gender, animal_id, father AS parent
FROM animal
UNION ALL
SELECT 1, animal_id, mother
FROM animal
)
START WITH
animal_id = :id
CONNECT BY
parent = PRIOR animal_id
ORDER BY
animal_id
Run Code Online (Sandbox Code Playgroud)
,后者更具性能.
请参阅我的博客中有关执行计划详细信息的文章:
要查找与给定范围重叠的所有范围,可以使用以下查询:
SELECT *
FROM ranges
WHERE end_date >= @start
AND start_date <= @end
Run Code Online (Sandbox Code Playgroud)
,但在SQL Server这个更复杂的查询中更快地产生相同的结果:
SELECT *
FROM ranges
WHERE (start_date > @start AND start_date <= @end)
OR (@start BETWEEN start_date AND end_date)
Run Code Online (Sandbox Code Playgroud)
,不管你信不信,我的博客上也有一篇文章:
SQL Server 也缺乏一种有效的累积聚合方式,所以这个查询:
SELECT mi.id, SUM(mo.value) AS running_sum
FROM mytable mi
JOIN mytable mo
ON mo.id <= mi.id
GROUP BY
mi.id
Run Code Online (Sandbox Code Playgroud)
可以更有效地重写,主帮助我,游标(你听到我的权利:cursors,more efficiently并SQL Server用一句话).
请参阅我的博客中有关如何执行此操作的文章:
在财务应用程序中通常会遇到某种查询,它会搜索货币的有效汇率,例如Oracle:
SELECT TO_CHAR(SUM(xac_amount * rte_rate), 'FM999G999G999G999G999G999D999999')
FROM t_transaction x
JOIN t_rate r
ON (rte_currency, rte_date) IN
(
SELECT xac_currency, MAX(rte_date)
FROM t_rate
WHERE rte_currency = xac_currency
AND rte_date <= xac_date
)
Run Code Online (Sandbox Code Playgroud)
可以大量重写此查询以使用相等条件,该条件允许HASH JOIN而不是NESTED LOOPS:
WITH v_rate AS
(
SELECT cur_id AS eff_currency, dte_date AS eff_date, rte_rate AS eff_rate
FROM (
SELECT cur_id, dte_date,
(
SELECT MAX(rte_date)
FROM t_rate ri
WHERE rte_currency = cur_id
AND rte_date <= dte_date
) AS rte_effdate
FROM (
SELECT (
SELECT MAX(rte_date)
FROM t_rate
) - level + 1 AS dte_date
FROM dual
CONNECT BY
level <=
(
SELECT MAX(rte_date) - MIN(rte_date)
FROM t_rate
)
) v_date,
(
SELECT 1 AS cur_id
FROM dual
UNION ALL
SELECT 2 AS cur_id
FROM dual
) v_currency
) v_eff
LEFT JOIN
t_rate
ON rte_currency = cur_id
AND rte_date = rte_effdate
)
SELECT TO_CHAR(SUM(xac_amount * eff_rate), 'FM999G999G999G999G999G999D999999')
FROM (
SELECT xac_currency, TRUNC(xac_date) AS xac_date, SUM(xac_amount) AS xac_amount, COUNT(*) AS cnt
FROM t_transaction x
GROUP BY
xac_currency, TRUNC(xac_date)
)
JOIN v_rate
ON eff_currency = xac_currency
AND eff_date = xac_date
Run Code Online (Sandbox Code Playgroud)
尽管地狱笨重,但后者的查询6速度要快一些.
其主要思想这里替换<=用=,这需要建立在内存中的日历表.要JOIN与.
这里有一些使用Oracle 8和9(当然,有时相反可能会使查询更简单或更快):
如果不使用括号来覆盖运算符优先级,则可以删除括号.一个简单的例子是当你where子句中的所有布尔运算符都相同时:where ((a or b) or c)相当于where a or b or c.
子查询通常(如果不总是)与主查询合并以简化它.根据我的经验,这通常会大大提高性能:
select foo.a,
bar.a
from foomatic foo,
bartastic bar
where foo.id = bar.id and
bar.id = (
select ban.id
from bantabulous ban
where ban.bandana = 42
)
;
Run Code Online (Sandbox Code Playgroud)
相当于
select foo.a,
bar.a
from foomatic foo,
bartastic bar,
bantabulous ban
where foo.id = bar.id and
bar.id = ban.id and
ban.bandana = 42
;
Run Code Online (Sandbox Code Playgroud)
使用ANSI连接将很多"代码猴"逻辑与where子句中真正有趣的部分分开:前一个查询相当于
select foo.a,
bar.a
from foomatic foo
join bartastic bar on bar.id = foo.id
join bantabulous ban on ban.id = bar.id
where ban.bandana = 42
;
Run Code Online (Sandbox Code Playgroud)
如果要检查是否存在行,请不要使用count(*),而是使用rownum = 1或者将查询放在where exists子句中以仅获取一行而不是全部.
正如@Quassnoi所说,Optimiser经常做得很好.帮助它的一种方法是确保索引和统计信息是最新的,并且查询工作负载存在合适的索引.
我喜欢通过连接查询替换所有类型的子选择.
这一点很明显:
SELECT *
FROM mytable mo
WHERE EXISTS
(
SELECT *
FROM othertable o
WHERE o.othercol = mo.col
)
Run Code Online (Sandbox Code Playgroud)
通过
SELECT mo.*
FROM mytable mo inner join othertable o on o.othercol = mo.col
Run Code Online (Sandbox Code Playgroud)
而且这个估计值很低:
SELECT *
FROM mytable mo
WHERE NOT EXISTS
(
SELECT *
FROM othertable o
WHERE o.othercol = mo.col
)
Run Code Online (Sandbox Code Playgroud)
通过
SELECT mo.*
FROM mytable mo left outer join othertable o on o.othercol = mo.col
WHERE o.othercol is null
Run Code Online (Sandbox Code Playgroud)
它可以帮助DBMS在一个大请求中选择好的执行计划.
我喜欢团队中的每个人都遵循一套标准来使代码可读,可维护,易懂,可清洗等.:)
这里还有更多东西你最有用的数据库标准是什么?
| 归档时间: |
|
| 查看次数: |
15584 次 |
| 最近记录: |