Kev*_*ock 2 sql oracle oracle10g
我将通过声明我正在使用Oracle 10g企业版并且我对Oracle相对较新来解释这个问题.
我有一个包含以下架构的表:
ID integer (pk) -- unique index
PERSON_ID integer (fk) -- b-tree index
NAME_PART nvarchar -- b-tree index
NAME_PART_ID integer (fk) -- bitmap index
Run Code Online (Sandbox Code Playgroud)
它PERSON_ID是人员记录的唯一ID的外键.这NAME_PART_ID是具有静态值的查找表的外键,如"名字","中间名","姓氏"等.表的要点是分别存储人名的各个部分.每个人的记录至少都有一个名字.在尝试提取数据时,我首先考虑使用连接,如下所示:
select
first_name.person_id,
first_name.name_part,
middle_name.name_part,
last_name.name_part
from
NAME_PARTS first_name
left join
NAME_PARTS middle_name
on first_name.person_id = middle_name.person_id
left join
NAME_PARTS last_name
on first_name.person_id = last_name.person_id
where
first_name.name_part_id = 1
and middle_name.name_part_id = 2
and last_name.name_part_id = 3;
Run Code Online (Sandbox Code Playgroud)
但该表有数千万条记录,并且NAME_PART_ID未使用该列的位图索引.解释计划表明优化器正在使用全表扫描和散列连接来检索数据.
有什么建议?
编辑:表格设计的原因是因为数据库用于几种不同的文化,每种文化都有不同的个人命名惯例(例如在一些中东文化中,个人通常有名字,然后他们的父亲的名字,然后他父亲的名字等).很难创建一个包含多个列的表来解释所有文化差异.
鉴于你实际上是在进行全表扫描(因为你的查询是从这个表中提取所有数据,排除那些没有名字部分的第一个,中间或最后一行),你可能要考虑写查询,以便它只是以稍微不同的格式返回数据,例如:
SELECT person_id
, name_part_id
, name_part
FROM NAME_PART
WHERE name_part_id IN (1, 2, 3)
ORDER BY person_id
, name_part_id;
Run Code Online (Sandbox Code Playgroud)
当然,对于每个名称,最终会有3行而不是1行,但是对于您的客户端代码来说,将这些行放在一起可能是微不足道的.您还可以使用decode,group by和max将3行向上滚动到一行:
SELECT person_id
, max(decode(name_part_id, 1, name_part, null)) first
, max(decode(name_part_id, 2, name_part, null)) middle
, max(decode(name_part_id, 3, name_part, null)) last
FROM NAME_PART
WHERE name_part_id IN (1, 2, 3)
GROUP BY person_id
ORDER BY person_id;
Run Code Online (Sandbox Code Playgroud)
这将产生与原始查询相同的结果.两个版本只扫描一次表(有一个排序),而不是处理3向连接.如果您使表成为person_id索引上的索引组织表,则可以保存排序步骤.
我用一张56,150人的桌子进行了测试,这里是结果的简要说明:
原始查询:
Execution Plan
----------------------------------------------------------
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 113K| 11M| | 1364 (2)|
|* 1 | HASH JOIN | | 113K| 11M| 2528K| 1364 (2)|
|* 2 | TABLE ACCESS FULL | NAME_PART | 56150 | 1864K| | 229 (3)|
|* 3 | HASH JOIN | | 79792 | 5298K| 2528K| 706 (2)|
|* 4 | TABLE ACCESS FULL| NAME_PART | 56150 | 1864K| | 229 (3)|
|* 5 | TABLE ACCESS FULL| NAME_PART | 56150 | 1864K| | 229 (3)|
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("FIRST_NAME"."PERSON_ID"="LAST_NAME"."PERSON_ID")
2 - filter("LAST_NAME"."NAME_PART_ID"=3)
3 - access("FIRST_NAME"."PERSON_ID"="MIDDLE_NAME"."PERSON_ID")
4 - filter("FIRST_NAME"."NAME_PART_ID"=1)
5 - filter("MIDDLE_NAME"."NAME_PART_ID"=2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6740 consistent gets
0 physical reads
0 redo size
5298174 bytes sent via SQL*Net to client
26435 bytes received via SQL*Net from client
3745 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
56150 rows processed
Run Code Online (Sandbox Code Playgroud)
我的查询#1(3行/人):
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 168K| 5593K| | 1776 (2)|
| 1 | SORT ORDER BY | | 168K| 5593K| 14M| 1776 (2)|
|* 2 | TABLE ACCESS FULL| NAME_PART | 168K| 5593K| | 230 (3)|
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("NAME_PART_ID"=1 OR "NAME_PART_ID"=2 OR "NAME_PART_ID"=3)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1005 consistent gets
0 physical reads
0 redo size
3799794 bytes sent via SQL*Net to client
78837 bytes received via SQL*Net from client
11231 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
168450 rows processed
Run Code Online (Sandbox Code Playgroud)
我的查询#2(1行/人):
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 56150 | 1864K| | 1115 (3)|
| 1 | SORT GROUP BY | | 56150 | 1864K| 9728K| 1115 (3)|
|* 2 | TABLE ACCESS FULL| NAME_PART | 168K| 5593K| | 230 (3)|
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("NAME_PART_ID"=1 OR "NAME_PART_ID"=2 OR "NAME_PART_ID"=3)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1005 consistent gets
0 physical reads
0 redo size
5298159 bytes sent via SQL*Net to client
26435 bytes received via SQL*Net from client
3745 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
56150 rows processed
Run Code Online (Sandbox Code Playgroud)
事实证明,你可以更快地挤压它; 我试图通过添加索引提示来强制使用person_id索引来避免排序.我设法击败另外10%,但它仍然看起来像是在排序:
SELECT /*+ index(name_part,NAME_PART_person_id) */ person_id
, max(decode(name_part_id, 1, name_part)) first
, max(decode(name_part_id, 2, name_part)) middle
, max(decode(name_part_id, 3, name_part)) last
FROM name_part
WHERE name_part_id IN (1, 2, 3)
GROUP BY person_id
ORDER BY person_id;
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 56150 | 1864K| | 3385 (1)|
| 1 | SORT GROUP BY | | 56150 | 1864K| 9728K| 3385 (1)|
| 2 | INLIST ITERATOR | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID | NAME_PART | 168K| 5593K| | 2500 (1)|
| 4 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 5 | BITMAP INDEX SINGLE VALUE | NAME_PART_NAME_PART_ID| | | | |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("NAME_PART_ID"=1 OR "NAME_PART_ID"=2 OR "NAME_PART_ID"=3)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
971 consistent gets
0 physical reads
0 redo size
5298159 bytes sent via SQL*Net to client
26435 bytes received via SQL*Net from client
3745 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
56150 rows processed
Run Code Online (Sandbox Code Playgroud)
但是,上述计划都是基于您从整个表中进行选择的假设.如果基于person_id约束结果(例如,55968和56000之间的person_id),则结果是带有散列连接的原始查询是最快的(对于我指定的约束,27对106一致的获取).
在第三方面,如果上面的查询被用于填充使用光标滚动结果集的GUI(这样你最初只会看到结果集的前N行 - 通过添加" rowcount <50"谓词),我的查询版本再次变得快速 - 非常快(4个一致得到417).
这个故事的寓意是它确实取决于你如何访问数据.当针对不同子集应用时,对整个结果集有效的查询可能更糟.
由于您没有以任何方式过滤表,优化器可能是正确的,HASH JOIN是加入未过滤表的最佳方式.
在这种情况下,位图索引对您没有多大帮助.
这是很好的制作OR年代和AND多个低基数列的,而不是对单个列纯净滤波.
为此,全表扫描几乎总是更好.
请注意,这不是最好的设计.我宁愿添加列first_name,last_name并middle_name到person,建立在每一列的索引并使其为空.
在这种情况下,您拥有与设计中相同的表,但没有表格.
索引包含名称和rowid表格以及表格,并且rowid上的连接更有效.
更新:
我自己是一个文化的成员,使用父亲的名字作为个人姓名的一部分,我可以说,对于大多数情况,使用三个字段就足够了.
姓氏的一个字段,给定名称的一个字段和两者之间的所有字段(没有进一步的专业化)是处理名称的一种不错的方式.
只需依靠您的用户.在现代世界中,几乎每个人都知道如何使他们的名字符合这个模式.
例如:
Family name: Picasso
Given name: Pablo
Middle name: Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y
Run Code Online (Sandbox Code Playgroud)
P. S.你知道那些亲密的朋友刚打电话给他PABLO~1吗?