多列变量匹配的查询优化

Zer*_*iny 6 postgresql performance query-performance

TL; DR - 我正在寻找有关如何更好地编写下面查询的建议。

下面是我的表结构的精简版本,其中包含一些示例数据。我根本无法控制数据结构,因此不幸的是,关于架构更改的建议对我没有帮助。

问题

给定 abuilding_level_key和 afaction_key我需要building_levelsbuilding_culture_variants表中返回从joined到其最接近匹配的记录。

例如,如果我使用goblin_walls&fact_blue我希望goblin_walls加入 building_culture_variant_keyrecord 的记录2

表的示例结构如下所示:

具有 db 结构的示例数据

  • factions- 是真实表格的压缩版本,因为文化/亚文化记录存储在不同的表格中,但它可以理解这一点。该表仅在查询中真正需要,以便可以引用与给定faction_key.

  • building_levels- 作为系统中每个建筑物的基本记录。每个建筑物只有一个记录。

  • building_culture_variants- 顾名思义;可以有用于每个多个记录building_level_key并且每个变体记录是使用针对建筑物水平匹配building_level_key和的组合faction_keyculture_keysubculture_key

匹配的工作原理

匹配从building_level_key在文化变体表中查找给定开始。这是一场艰难的比赛,需要加入任何两个建筑等级和文化变体。

每个建筑级别记录将至少有一个文化变体。通常每个建筑级别有多个文化变体,但平均不超过 4 个。最常见的文化变体是“通用”变体,这意味着faction_keyculture_keysubculture_key列都为空,因此该建筑将与任何派系匹配。但是,派系列的任何组合都可以有一个键,因此我需要将给定的派系与文化变体中的每个派系列进行匹配。

附注:文化变异键始终保持一致,这意味着我永远不会有这样一个场景,一个faction_keysubculture_key在文化变异表不匹配对应faction_key,并subculture_key从派系表(与亚表,它已经为清晰起见,省略) .

我试过的

我提供了一个sql fiddle来使用,并在下面包含了我的查询版本:

SELECT 
  "building_culture_variants"."building_culture_variant_key" AS qualified_key, 
  "building_levels"."building_level_key" AS building_key, 
  "building_levels"."create_time", 
  "building_levels"."create_cost", 
  "building_culture_variants"."name",
  'fact_blue'::text AS faction_key
FROM 
  "building_levels" 
  INNER JOIN "building_culture_variants" ON (
    "building_culture_variants"."building_culture_variant_key" IN (
      SELECT 
        "building_culture_variant_key" 
      FROM 
        (
          SELECT 
            "building_culture_variants"."building_culture_variant_key", 
            (
                CASE WHEN "building_culture_variants"."faction_key" = "building_factions"."faction_key" THEN 1 WHEN "building_culture_variants"."faction_key" IS NULL THEN 0 ELSE NULL END + 
                CASE WHEN "building_culture_variants"."culture_key" = "building_factions"."culture_key" THEN 1 WHEN "building_culture_variants"."culture_key" IS NULL THEN 0 ELSE NULL END + 
                CASE WHEN "building_culture_variants"."subculture_key" = "building_factions"."subculture_key" THEN 1 WHEN "building_culture_variants"."subculture_key" IS NULL THEN 0 ELSE NULL END
            ) AS match_count 
          FROM 
            "building_culture_variants" 
            INNER JOIN (
              -- This is a subquery because here I would join a couple more tables
              -- to collect all of the faction info
              SELECT 
                "factions"."faction_key", 
                "factions"."culture_key", 
                "factions"."subculture_key"
              FROM 
                "factions" 
            ) AS "building_factions" ON ("building_factions"."faction_key" = 'fact_blue')
          WHERE ("building_levels"."building_level_key" = "building_culture_variants"."building_level_key") 
          GROUP BY 
            match_count, 
            building_culture_variant_key 
          ORDER BY 
            match_count DESC NULLS LAST 
          LIMIT 
            1
        ) AS "culture_variant_match"
    )
  ) 
WHERE "building_levels"."building_level_key" = 'goblin_walls'
ORDER BY 
  "building_levels"."building_level_key"
Run Code Online (Sandbox Code Playgroud)

我上面提供的查询有效并完成了工作,但我觉得我只是试图通过嵌套一堆查询来暴力解决问题。我觉得我没有利用一些 sql 构造来简化查询的性能或大大简化查询。

所以我真正要问的是,有没有更好的方法可以重写查询以提高效率?

Mic*_*utz 3

SQL 代码看起来几乎就像您正在尝试以过程方式执行操作。这对于像 SQL 这样的声明性语言来说效率不高。

加入

当您有JOIN两个数据集并且需要使一个数据集表现得就像NULL值是通配符一样,您可以JOIN这样做

Select *
from TableA
  join TableB
    on TableA.col2match = coalesce( TableB.col2match, TableA.col2match )
Run Code Online (Sandbox Code Playgroud)

TableA.col2match然而,只有当你知道总是 时,这个技巧才有效NOT NULL

分数和排名

您已经提供了评分功能来对比赛进行评分。我建议您将其放入函数中以便于维护。

大多数数据库可以RANK()为您提供分数。这是一个analytic函数。你真的应该阅读它们。

热膨胀系数

我使用 CTE 来帮助读者了解正在发生的事情。

如果将该WHERE子句放在 CTE 部分中,PostgreSQL 将实现最少的行数(根据 dbfidle 计划)。

结果 SQL

我构建了此 SQL,以便您可以将该WHERE子句移到 CTE 之外。通过这样做,您可以CREATE VIEW在 SQL 上。这将简化中间层开发人员必须编写的 SQL。

with "building_factions" as (
  -- This is a subquery because here I would join a couple more tables
  -- to collect all of the faction info
  SELECT 
    "factions"."faction_key", 
    "factions"."culture_key", 
    "factions"."subculture_key"
  FROM 
    "factions"
  where "factions"."faction_key" = 'fact_blue'
) , "building_info" as (
  select
    "building_culture_variants"."building_culture_variant_key", 
    "building_levels"."building_level_key", 
    "building_levels"."create_time", 
    "building_levels"."create_cost", 
    "building_culture_variants"."name",
    "building_culture_variants"."faction_key", 
    "building_culture_variants"."culture_key", 
    "building_culture_variants"."subculture_key"
  from "building_levels" 
    join "building_culture_variants"
      on "building_levels"."building_level_key" = "building_culture_variants"."building_level_key"
  where "building_levels"."building_level_key" = 'goblin_walls'
), "scoreRanked_data" as (
  select 
    "building_factions"."faction_key", 
    "building_factions"."culture_key", 
    "building_factions"."subculture_key",
    "building_info"."building_culture_variant_key", 
    "building_info"."building_level_key", 
    "building_info"."create_time", 
    "building_info"."create_cost", 
    "building_info"."name",
    rank() over (partition by "building_factions"."faction_key", 
                              "building_factions"."culture_key", 
                              "building_factions"."subculture_key",
                              "building_info"."building_level_key"
                 order by (
      CASE WHEN "building_info"."faction_key" = "building_factions"."faction_key" THEN 1
           WHEN "building_info"."faction_key" IS NULL THEN 0
           ELSE NULL
      END + 
      CASE WHEN "building_info"."culture_key" = "building_factions"."culture_key" THEN 1
           WHEN "building_info"."culture_key" IS NULL THEN 0
           ELSE NULL
      END + 
      CASE WHEN "building_info"."subculture_key" = "building_factions"."subculture_key" THEN 1
           WHEN "building_info"."subculture_key" IS NULL THEN 0
           ELSE NULL
      END
      ) desc nulls last ) match_rank
  from "building_factions"
    join "building_info"
      on    "building_factions"."faction_key" = coalesce( "building_info"."faction_key", "building_factions"."faction_key")
        and "building_factions"."culture_key" = coalesce( "building_info"."culture_key", "building_factions"."culture_key")
        and "building_factions"."subculture_key" = coalesce( "building_info"."subculture_key", "building_factions"."subculture_key")
)
select *
from "scoreRanked_data"
where match_rank = 1
limit 1
Run Code Online (Sandbox Code Playgroud)