从SQLite查询中删除临时B树排序

rra*_*wrr 13 sqlite performance

我有一个非常基本的图像上传服务实现,您可以上传图像并标记它们.这是我的架构:

CREATE TABLE Tag(
    orm_id INTEGER PRIMARY KEY AUTOINCREMENT, 
    pid_high UNSIGNED BIG INT NOT NULL, 
    pid_low UNSIGNED BIG INT NOT NULL, 
    name STRING NOT NULL, 
    CONSTRAINT KeyConstraint UNIQUE (pid_high, pid_low) ON CONFLICT FAIL);

CREATE TABLE TagBridge(
    orm_id INTEGER PRIMARY KEY AUTOINCREMENT, 
    pid_high UNSIGNED BIG INT NOT NULL, 
    pid_low UNSIGNED BIG INT NOT NULL, 
    image_id_high UNSIGNED BIG INT NOT NULL, 
    image_id_low UNSIGNED BIG INT NOT NULL, 
    tag_id_high UNSIGNED BIG INT NOT NULL, 
    tag_id_low UNSIGNED BIG INT NOT NULL, 
    CONSTRAINT KeyConstraint UNIQUE (pid_high, pid_low) ON CONFLICT FAIL);

CREATE TABLE Image(
    orm_id INTEGER PRIMARY KEY AUTOINCREMENT, 
    pid_high UNSIGNED BIG INT NOT NULL,
    pid_low UNSIGNED BIG INT NOT NULL, 
    filehash STRING NOT NULL, 
    mime STRING NOT NULL, 
    uploadedDate INTEGER NOT NULL, 
    ratingsAverage REAL, 
    CONSTRAINT KeyConstraint UNIQUE (pid_high, pid_low) ON CONFLICT FAIL);
Run Code Online (Sandbox Code Playgroud)

和指数

CREATE INDEX ImageTest on Image(pid_high, pid_low, uploadedDate DESC);
CREATE INDEX ImagefilehashIndex ON Image (filehash);
CREATE INDEX ImageuploadedDateIndex ON Image (uploadedDate);
CREATE INDEX TagnameIndex ON Tag (name);
Run Code Online (Sandbox Code Playgroud)

有pid_high/pid_low字段而不是标准主键的原因是因为此服务使用客户端权威的128位GUID,但这不会显着影响查询速度.

由于这是互联网,这项服务的绝大多数图像都是猫,并标有"猫".事实上,50,000张图片中约有47,000张标有"猫"字样.获取所有标记为'cat'的图像的查询是

select i.* from Tag t, TagBridge b, Image i 
where 
    b.tag_id_high = t.pid_high AND b.tag_id_low = t.pid_low 
AND b.image_id_high = i.pid_high and b.image_id_low = i.pid_low 
AND t.name ='cat' 
order by uploadedDate DESC LIMIT 20;
Run Code Online (Sandbox Code Playgroud)

对此的查询计划是

sele  order          from  deta
----  -------------  ----  ----
0     0              0     SEARCH TABLE Tag AS t USING INDEX TagnameIndex (name=?) (~1 rows)
0     1              1     SCAN TABLE TagBridge AS b (~472 rows)
0     2              2     SEARCH TABLE Image AS i USING INDEX ImageTest (pid_high=? AND pid_low=?) (~1 rows)
0     0              0     USE TEMP B-TREE FOR ORDER BY
Run Code Online (Sandbox Code Playgroud)

这里的主要问题是最后一行,USE TEMP B-TREE FOR ORDER BY.这会显着减慢查询速度.如果没有'order by'子句,整个查询大约需要0.001秒才能运行.使用order by子句,查询需要0.483秒,这是400倍的性能损失.

我想在0.1秒内得到这个查询,但我不知道如何.我已经尝试了许多其他查询,添加和删除索引,但这是我能够运行的最快的.

Qua*_*noi 3

这是一个在过滤索引和排序索引之间进行选择的常见问题:

您应该保留一个流行标签列表(对于这些标签,排序索引更有利),并以某种方式禁止过滤索引(如果该标签很流行),例如:

SELECT  i.*
FROM    Tag t, TagBridge b, Image i 
WHERE   b.tag_id_high = t.pid_high AND b.tag_id_low = t.pid_low 
        AND b.image_id_high = i.pid_high AND b.image_id_low = i.pid_low 
        AND t.name || '' = 'cat' 
ORDER BY
        i.uploadedDate DESC
LIMIT 20
Run Code Online (Sandbox Code Playgroud)

或者,您可以对架构进行非规范化并添加uploadedDateTagBridge,用触发器或其他方式填充它。然后创建一个复合索引TagBridge (pid_high, pid_low, uploadedDate, image_id_high, image_id_low)并稍微重写查询:

SELECT  i.*
FROM    TagBridge b, Image i
WHERE   b.tag_id_high =
        (
        SELECT  t.pid_high
        FROM    Tag t
        WHERE   t.name = 'cat'
        )
        AND b.tag_id_low =
        (
        SELECT  t.pid_low
        FROM    Tag t
        WHERE   t.name = 'cat'
        )
        AND i.pid_high = b.image_id_high
        AND i.pid_low = b.image_id_low
ORDER BY
        b.uploadedDate DESC
LIMIT 20;
Run Code Online (Sandbox Code Playgroud)

双子查询是因为SQLite不理解元组语法。