Tre*_*vor 5 sql oracle performance oltp greatest-n-per-group
我试图建立需求快速运行回归,拉动从包含我们的Web服务器的所有历史活动数据库Apache的请求的基础设施.为了通过确保我们仍然回退来自较小客户端的请求来提高覆盖率,我想通过检索每个客户端最多n个(为了这个问题,比如10个)请求来确保请求的分发.
我发现这里回答了一些类似的问题,其中最接近的似乎是SQL查询,以便在一系列ID中返回每个ID的前N行,但答案主要是与我已经尝试过的性能无关的解决方案.例如,row_number()分析函数可以准确地获取我们正在寻找的数据:
SELECT
*
FROM
(
SELECT
dailylogdata.*,
row_number() over (partition by dailylogdata.contextid order by occurrencedate) rn
FROM
dailylogdata
WHERE
shorturl in (?)
)
WHERE
rn <= 10;
Run Code Online (Sandbox Code Playgroud)
但是,假设此表包含给定日期的数百万条目,并且此方法需要从索引中读取与我们的选择条件匹配的所有行,以便应用row_number分析函数,性能非常糟糕.我们最终选择了将近一百万行,只是因为他们的row_number超过10而抛弃了绝大多数行.执行上述查询时的统计信息:
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp||
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|| 0 | SELECT STATEMENT | | 1 | | 12222 |00:09:08.94 | 895K| 584K| 301 | | | | ||
||* 1 | VIEW | | 1 | 4427K| 12222 |00:09:08.94 | 895K| 584K| 301 | | | | ||
||* 2 | WINDOW SORT PUSHED RANK | | 1 | 4427K| 13536 |00:09:08.94 | 895K| 584K| 301 | 2709K| 743K| 97M (1)| 4096 ||
|| 3 | PARTITION RANGE SINGLE | | 1 | 4427K| 932K|00:22:27.90 | 895K| 584K| 0 | | | | ||
|| 4 | TABLE ACCESS BY LOCAL INDEX ROWID| DAILYLOGDATA | 1 | 4427K| 932K|00:22:27.61 | 895K| 584K| 0 | | | | ||
||* 5 | INDEX RANGE SCAN | DAILYLOGDATA_URLCONTEXT | 1 | 17345 | 932K|00:00:00.75 | 1448 | 0 | 0 | | | | ||
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| |
|Predicate Information (identified by operation id): |
|--------------------------------------------------- |
| |
| 1 - filter("RN"<=:SYS_B_2) |
| 2 - filter(ROW_NUMBER() OVER ( PARTITION BY "DAILYLOGDATA"."CONTEXTID" ORDER BY "OCCURRENCEDATE")<=:SYS_B_2) |
| 5 - access("SHORTURL"=:P1) |
| |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
但是,如果我们只查询特定上下文的前10个结果,我们可以更快地执行此操作:
SELECT
*
FROM
(
SELECT
dailylogdata.*
FROM
dailylogdata
WHERE
shorturl in (?)
and contextid = ?
)
WHERE
rownum <= 10;
Run Code Online (Sandbox Code Playgroud)
运行此查询的统计信息:
|-------------------------------------------------------------------------------------------------------------------------|
|| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers ||
|-------------------------------------------------------------------------------------------------------------------------|
|| 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.01 | 14 ||
||* 1 | COUNT STOPKEY | | 1 | | 10 |00:00:00.01 | 14 ||
|| 2 | PARTITION RANGE SINGLE | | 1 | 10 | 10 |00:00:00.01 | 14 ||
|| 3 | TABLE ACCESS BY LOCAL INDEX ROWID| DAILYLOGDATA | 1 | 10 | 10 |00:00:00.01 | 14 ||
||* 4 | INDEX RANGE SCAN | DAILYLOGDATA_URLCONTEXT | 1 | 1 | 10 |00:00:00.01 | 5 ||
|-------------------------------------------------------------------------------------------------------------------------|
| |
|Predicate Information (identified by operation id): |
|--------------------------------------------------- |
| |
| 1 - filter(ROWNUM<=10) |
| 4 - access("SHORTURL"=:P1 AND "CONTEXTID"=TO_NUMBER(:P2)) |
| |
+-------------------------------------------------------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
在这种情况下,Oracle足够聪明,可以在获得10个结果后停止检索数据.我可以收集一整套上下文并以编程方式生成一个查询,其中包含每个上下文和union all整个混乱的查询的一个实例,但考虑到上下文的数量,我们可能会遇到内部Oracle限制,即使不是,这种方法充满了污秽.
有没有人知道一种方法可以保持第一个查询的简单性,同时保留与第二个查询相称的性能?另请注意,我实际上并不关心检索一组稳定的行; 只要它们满足我的标准,它们就可以用于回归.
编辑: Adam Musch的建议奏效了.由于我不能让他们对他的回答发表评论,因此我将这些更改附加到绩效结果中.我这次也在使用更大的数据集进行测试,这里是我原来的row_number方法中的(缓存)统计数据,用于比较:
|-------------------------------------------------------------------------------------------------------------------------------------------------|
|| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem ||
|-------------------------------------------------------------------------------------------------------------------------------------------------|
|| 0 | SELECT STATEMENT | | 1 | | 12624 |00:00:22.34 | 1186K| 931K| | | ||
||* 1 | VIEW | | 1 | 1163K| 12624 |00:00:22.34 | 1186K| 931K| | | ||
||* 2 | WINDOW NOSORT | | 1 | 1163K| 1213K|00:00:21.82 | 1186K| 931K| 3036M| 17M| ||
|| 3 | TABLE ACCESS BY INDEX ROWID| TWTEST | 1 | 1163K| 1213K|00:00:20.41 | 1186K| 931K| | | ||
||* 4 | INDEX RANGE SCAN | TWTEST_URLCONTEXT | 1 | 1163K| 1213K|00:00:00.81 | 8568 | 0 | | | ||
|-------------------------------------------------------------------------------------------------------------------------------------------------|
| |
|Predicate Information (identified by operation id): |
|--------------------------------------------------- |
| |
| 1 - filter("RN"<=10) |
| 2 - filter(ROW_NUMBER() OVER ( PARTITION BY "CONTEXTID" ORDER BY NULL )<=10) |
| 4 - access("SHORTURL"=:P1) |
+-------------------------------------------------------------------------------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
我冒昧地淡化了亚当的建议; 这是修改后的查询......
select
*
from
twtest
where
rowid in (
select
rowid
from (
select
rowid,
shorturl,
row_number() over (partition by shorturl, contextid
order by null) rn
from
twtest
)
where rn <= 10
and shorturl in (?)
);
Run Code Online (Sandbox Code Playgroud)
...以及来自(缓存)评估的统计数据:
|--------------------------------------------------------------------------------------------------------------------------------------|
|| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem ||
|--------------------------------------------------------------------------------------------------------------------------------------|
|| 0 | SELECT STATEMENT | | 1 | | 12624 |00:00:01.33 | 19391 | | | ||
|| 1 | NESTED LOOPS | | 1 | 1 | 12624 |00:00:01.33 | 19391 | | | ||
|| 2 | VIEW | VW_NSO_1 | 1 | 1163K| 12624 |00:00:01.27 | 6770 | | | ||
|| 3 | HASH UNIQUE | | 1 | 1 | 12624 |00:00:01.27 | 6770 | 1377K| 1377K| 5065K (0)||
||* 4 | VIEW | | 1 | 1163K| 12624 |00:00:01.25 | 6770 | | | ||
||* 5 | WINDOW NOSORT | | 1 | 1163K| 1213K|00:00:01.09 | 6770 | 283M| 5598K| ||
||* 6 | INDEX RANGE SCAN | TWTEST_URLCONTEXT | 1 | 1163K| 1213K|00:00:00.40 | 6770 | | | ||
|| 7 | TABLE ACCESS BY USER ROWID| TWTEST | 12624 | 1 | 12624 |00:00:00.04 | 12621 | | | ||
|--------------------------------------------------------------------------------------------------------------------------------------|
| |
|Predicate Information (identified by operation id): |
|--------------------------------------------------- |
| |
| 4 - filter("RN"<=10) |
| 5 - filter(ROW_NUMBER() OVER ( PARTITION BY "SHORTURL","CONTEXTID" ORDER BY NULL NULL )<=10) |
| 6 - access("SHORTURL"=:P1) |
| |
|Note |
|----- |
| - dynamic sampling used for this statement (level=2) |
| |
+--------------------------------------------------------------------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
正如所宣传的那样,我们只访问完全过滤行的dailylogdata表.我担心它似乎仍在根据它声称要选择的行数(1213K)对urlcontext索引进行全面扫描,但考虑到它只使用了6770个缓冲区(即使我这个数字保持不变增加特定于上下文的结果的数量)这可能会产生误导.
这是一种笨拙的解决方案,但它似乎做了您想要的事情:尽快缩短索引扫描,并且在通过过滤条件和 top-n 查询条件合格之前不读取数据。
请注意,它是在shorturl =条件下测试的,而不是在shorturl IN条件下测试的。
with rowid_list as
(select rowid
from (select *
from (select rowid,
row_number() over (partition by shorturl, contextid
order by null) rn
from dailylogdata
)
where rn <= 10
)
where shorturl = ?
)
select *
from dailylogdata
where rowid in (select rowid from rowid_list)
Run Code Online (Sandbox Code Playgroud)
该子句获取前 10 个 rowid,对满足您条件with的每个唯一组合进行 WINDOW NOSORT 过滤。然后它循环该组 rowid,按 rowid 获取每个 rowid。shorturlcontextid
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 286 | 1536 (1)| 00:00:19 |
| 1 | NESTED LOOPS | | 1 | 286 | 1536 (1)| 00:00:19 |
| 2 | VIEW | VW_NSO_1 | 136K| 1596K| 910 (1)| 00:00:11 |
| 3 | HASH UNIQUE | | 1 | 3326K| | |
|* 4 | VIEW | | 136K| 3326K| 910 (1)| 00:00:11 |
|* 5 | WINDOW NOSORT | | 136K| 2794K| 910 (1)| 00:00:11 |
|* 6 | INDEX RANGE SCAN | TABLE_REDACTED_INDEX | 136K| 2794K| 910 (1)| 00:00:11 |
| 7 | TABLE ACCESS BY USER ROWID| TABLE_REDACTED | 1 | 274 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("RN"<=10)
5 - filter(ROW_NUMBER() OVER ( PARTITION BY "CLIENT_ID","SCE_ID" ORDER BY NULL NULL
)<=10)
6 - access("TABLE_REDACTED"."SHORTURL"=:b1)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
889 次 |
| 最近记录: |