使用双子查询、单子查询和全内连接语句调试查询之间的性能差异(和问题)

ale*_*nst 7 sql postgresql inner-join subquery

我有一个复杂的业务逻辑,需要我执行 2 级嵌套查询。查询由 Django 的 ORM 生成。在问题的底部,我将按原样提供查询以及EXPLAIN适合使用PEV2查看的完整内容,但为了帮助读者更好地理解问题,我将从更具概念性的解释开始。

这是对我们正在做的事情的一个非常幼稚的描述:

some_ids = get_id_based_on_some_conditions(*conditions*)
some_other_ids = get_some_other_ids_based_on_some_conditions_and_filtering_by_some_ids(*other_conditions*, some_ids)
results = get_results_based_on_even_more_conditions_and_filtering_by_some_other_ids(*another_set_of_conditions*, some_other_ids)
Run Code Online (Sandbox Code Playgroud)

使用子查询将以下伪代码转换为实际 SQL 非常容易。直接转换为以下伪查询:

some_ids = get_id_based_on_some_conditions(*conditions*)
some_other_ids = get_some_other_ids_based_on_some_conditions_and_filtering_by_some_ids(*other_conditions*, some_ids)
results = get_results_based_on_even_more_conditions_and_filtering_by_some_other_ids(*another_set_of_conditions*, some_other_ids)
Run Code Online (Sandbox Code Playgroud)

由于考虑到它返回的行数(略高于 9.000),查询花费了相当多的时间(~0.6s),我认为它可能有助于用内部联接替换第二级子查询。

事实上,这使查询速度更慢(现在约为 1.7 秒)。所以我想也许规划器没有正确理解内部连接的子查询会发生什么,并做了一些严重的错误计算/高估/低估,所以我用更多的内部连接替换了第一级子查询,这导致更糟糕结果(现在约为 10 秒)。

我已经分析了EXPLAINS几个小时的查询,但我无法弄清楚为什么使用内部联接会使一切变慢。我也不知道如何判断我的(当前)最佳查询是否实际上是我能得到的最好的查询,或者是否有一些我没有做的事情可能会加快速度。

所以,我的问题是:

  • 为什么内部连接比子查询慢?
  • 我怎么知道我是否正在尽一切可能从我的数据库中榨取最大性能,或者我是否遗漏了什么?

实际查询和EXPLAINS原样:

使用 2 级子查询进行查询:

select 
    foo,
    bar
from
    t1,
    t2
where
    condition1 = something and
    condition2 in (   <---- first level subquery
        select 
            id
        from
            t3
        where
            condition3 = another_something and
            condition4 in (    <---- second level subquery
                select
                    another_id
                from
                    t4
                where
                    condition5 = something_something and
                    condition6 = another_something_something
            )
    )
Run Code Online (Sandbox Code Playgroud)

完整说明:https : //pastebin.com/jDy3FyKp

使用 1 级子查询进行查询:

SELECT DISTINCT
    "phdrug_phdrug"."id",
    "phdrug_phdrug"."uuid",
    "phdrug_phdrug"."default_description",
    "phdrug_phdrug"."alternative_description",
    "phdrug_phdrug"."ean",
    "phdrug_phdrug"."mirror_ean",
    "phdrug_phdrug"."parent_ean",
    "phdrug_phdrug"."reg_num",
    "phdrug_phdrug"."medika_code",
    "phdrug_phdrug"."atc_iv",
    "phdrug_phdrug"."product_type",
    "phdrug_phdrug"."fraction",
    "phdrug_phdrug"."active",
    "phdrug_phdrug"."loyal",
    "phdrug_phdrug"."patent",
    "phdrug_phdrug"."chronics",
    "phdrug_phdrug"."recipe",
    "phdrug_phdrug"."deal",
    "phdrug_phdrug"."specialized",
    "phdrug_phdrug"."armored",
    "phdrug_phdrug"."top_hight_speciality",
    "phdrug_phdrug"."top_generic",
    "phdrug_phdrug"."hight_speciality",
    "phdrug_phdrug"."temp_8_15",
    "phdrug_phdrug"."temp_15_25",
    "phdrug_phdrug"."temp_2_8",
    "phdrug_phdrug"."temp_less_15",
    "phdrug_phdrug"."new",
    "phdrug_phdrug"."mdk_internal_code",
    "phdrug_phdrug"."mdk_single_id",
    "phdrug_phdrug"."mdk_object_id",
    "phdrug_phdrug"."is_from_mdk_db",
    "phdrug_phdrug"."top",
    "phdrug_phdrug"."laboratory_name",
    "phdrug_phdrug"."laboratory_alternative_name",
    "phdrug_phdrug"."imported",
    "phdrug_phdrug"."imported_country",
    "phdrug_phdrug"."laboratory_id",
    "phdrug_phdrug"."specialty",
    "phdrug_phdrug"."dimension_id",
    "phdrug_phdrug"."featured",
    "phdrug_phdrug"."top_ae_rank",
    "phdrug_phdrug"."top_farma_rank" 
FROM
    "phdrug_phdrug"
    INNER JOIN "monetary_drugprice" ON ( "phdrug_phdrug"."id" = "monetary_drugprice"."drug_id" )
    INNER JOIN "phdrug_phdrugpicture" ON ( "phdrug_phdrug"."id" = "phdrug_phdrugpicture"."drug_id" ) 
WHERE
    (
        "monetary_drugprice"."id" IN (
        SELECT
            V0."id" 
        FROM
            "monetary_drugprice" V0 
        WHERE
            (
                V0."pricelist_id" IN (
                SELECT DISTINCT ON
                    ( U0."id" ) U0."id" 
                FROM
                    "monetary_pricelist" U0
                    INNER JOIN "monetary_pricelistdestinations" U1 ON ( U0."id" = U1."pricelist_id" )
                    INNER JOIN "organization_organization" U2 ON ( U0."manager_id" = U2."id" )
                    INNER JOIN "courier_carrier_pricelists" U3 ON ( U0."id" = U3."pricelist_id" )
                    INNER JOIN "courier_carrier" U4 ON ( U3."carrier_id" = U4."id" )
                    INNER JOIN "courier_carrierdelivery" U5 ON ( U4."id" = U5."carrier_id" )
                    INNER JOIN "monetary_pricelistcountry" U6 ON ( U0."id" = U6."pricelist_id" ) 
                WHERE
                    (
                        (
                            U0."expires" = FALSE 
                            OR (
                                U0."expires" = TRUE 
                                AND ( U0."datestart" AT TIME ZONE'UTC' ) :: DATE <= '2020-05-01' 
                                AND ( U0."dateend" AT TIME ZONE'UTC' ) :: DATE >= '2020-05-01' 
                            ) 
                        ) 
                        AND U0."active" = TRUE 
                        AND U1."to_public" = TRUE 
                        AND U2."organization_type" = 2 
                        AND (
                            U5."dst_country" = 'MX' 
                            OR U5."ignore_country_filter" = TRUE 
                        ) 
                        AND U6."country" = 'MX' 
                        AND U2."active" = TRUE 
                    ) 
                ) 
                AND V0."stock" > 0 
            ) 
        ) 
        AND "phdrug_phdrug"."active" = TRUE 
        AND "phdrug_phdrugpicture"."is_main" = TRUE 
    ) 
ORDER BY
    "phdrug_phdrug"."id" ASC,
    "phdrug_phdrug"."default_description" ASC
Run Code Online (Sandbox Code Playgroud)

完整说明:https : //pastebin.com/NidTZMxY

仅使用内部联接的查询:

SELECT DISTINCT
    "phdrug_phdrug"."id",
    "phdrug_phdrug"."uuid",
    "phdrug_phdrug"."default_description",
    "phdrug_phdrug"."alternative_description",
    "phdrug_phdrug"."ean",
    "phdrug_phdrug"."mirror_ean",
    "phdrug_phdrug"."parent_ean",
    "phdrug_phdrug"."reg_num",
    "phdrug_phdrug"."medika_code",
    "phdrug_phdrug"."atc_iv",
    "phdrug_phdrug"."product_type",
    "phdrug_phdrug"."fraction",
    "phdrug_phdrug"."active",
    "phdrug_phdrug"."loyal",
    "phdrug_phdrug"."patent",
    "phdrug_phdrug"."chronics",
    "phdrug_phdrug"."recipe",
    "phdrug_phdrug"."deal",
    "phdrug_phdrug"."specialized",
    "phdrug_phdrug"."armored",
    "phdrug_phdrug"."top_hight_speciality",
    "phdrug_phdrug"."top_generic",
    "phdrug_phdrug"."hight_speciality",
    "phdrug_phdrug"."temp_8_15",
    "phdrug_phdrug"."temp_15_25",
    "phdrug_phdrug"."temp_2_8",
    "phdrug_phdrug"."temp_less_15",
    "phdrug_phdrug"."new",
    "phdrug_phdrug"."mdk_internal_code",
    "phdrug_phdrug"."mdk_single_id",
    "phdrug_phdrug"."mdk_object_id",
    "phdrug_phdrug"."is_from_mdk_db",
    "phdrug_phdrug"."top",
    "phdrug_phdrug"."laboratory_name",
    "phdrug_phdrug"."laboratory_alternative_name",
    "phdrug_phdrug"."imported",
    "phdrug_phdrug"."imported_country",
    "phdrug_phdrug"."laboratory_id",
    "phdrug_phdrug"."specialty",
    "phdrug_phdrug"."dimension_id",
    "phdrug_phdrug"."featured",
    "phdrug_phdrug"."top_ae_rank",
    "phdrug_phdrug"."top_farma_rank" 
FROM
    "phdrug_phdrug"
    INNER JOIN "monetary_drugprice" ON ( "phdrug_phdrug"."id" = "monetary_drugprice"."drug_id" )
    INNER JOIN "phdrug_phdrugpicture" ON ( "phdrug_phdrug"."id" = "phdrug_phdrugpicture"."drug_id" ) 
WHERE
    (
        "monetary_drugprice"."id" IN (
        SELECT
            U0."id" 
        FROM
            "monetary_drugprice" U0
            INNER JOIN "monetary_pricelist" U1 ON ( U0."pricelist_id" = U1."id" )
            INNER JOIN "monetary_pricelistdestinations" U2 ON ( U1."id" = U2."pricelist_id" )
            INNER JOIN "organization_organization" U3 ON ( U1."manager_id" = U3."id" )
            INNER JOIN "courier_carrier_pricelists" U4 ON ( U1."id" = U4."pricelist_id" )
            INNER JOIN "courier_carrier" U5 ON ( U4."carrier_id" = U5."id" )
            INNER JOIN "courier_carrierdelivery" U6 ON ( U5."id" = U6."carrier_id" )
            INNER JOIN "monetary_pricelistcountry" U7 ON ( U1."id" = U7."pricelist_id" ) 
        WHERE
            (
                (
                    U1."expires" = FALSE 
                    OR (
                        U1."expires" = TRUE 
                        AND ( U1."datestart" AT TIME ZONE'UTC' ) :: DATE <= '2020-05-01' 
                        AND ( U1."dateend" AT TIME ZONE'UTC' ) :: DATE >= '2020-05-01' 
                    ) 
                ) 
                AND U1."active" = TRUE 
                AND U2."to_public" = TRUE 
                AND U3."organization_type" = 2 
                AND (
                    U6."dst_country" = 'MX' 
                    OR U6."ignore_country_filter" = TRUE 
                ) 
                AND U7."country" = 'MX' 
                AND U3."active" = TRUE 
                AND U0."stock" > 0 
            ) 
        ) 
        AND "phdrug_phdrug"."active" = TRUE 
        AND "phdrug_phdrugpicture"."is_main" = TRUE 
    ) 
ORDER BY
    "phdrug_phdrug"."id" ASC,
    "phdrug_phdrug"."default_description" ASC
Run Code Online (Sandbox Code Playgroud)

完整说明:https : //pastebin.com/DaVztBuV

小智 0

在不查看数据库结构的情况下,很难进行此级别的故障排除。我不得不编写同一脚本的两个不同版本,因为不同站点的环境不同。

  1. 确保表已正确索引。
  2. 将下标作为 FROM 语句的一部分,而不是 WHERE 语句的一部分,除非它是 IN 子句的一部分。

Select * from Table1 t1 左外连接 (Select * from Table2) t2 on t1.field = t2.field

  1. 如果它是一个大的拉力和/或大的重型使用表,那么使用临时表也会加快它的速度。但看起来你的脚本较小,这已经过头了。