将 ActiveRecord 查询重写为递归 SQL

Lor*_*enz 3 sql postgresql recursion activerecord ruby-on-rails

我有一个类似于活动记录结构的树,带有一个自引用对象 - 例如,该对象可以是同一类的另一个对象的父级或子级。我需要一种在代码中有效地映射此结构的方法。到目前为止,我一直在使用活动记录 ORM 在 ruby​​ 中做它,它的效率非常低。

这是 pod.rb 模型的样子:

    has_many :pod_parents, class_name: "PodPod", dependent: :delete_all
    has_many :parents, through: :pod_parents, :foreign_key => 'parent_id', :source => 'parent'
    has_many :pod_children, class_name: "PodPod", :foreign_key => 'parent_id'
    has_many :children, through: :pod_children, :source => 'pod'

    scope :active, -> {
        where(pod_state: "active").where(pod_type: ["standard","readonly"])
    }
Run Code Online (Sandbox Code Playgroud)

这是相关的数据库架构:

table "pods"
  t.string "intention"
  t.integer "user_id"
  t.string "slug"
  t.string "url_handle"
  t.index ["slug"], name: "index_pods_on_slug"
  t.index ["url_handle"], name: "index_pods_on_url_handle"

table "pod_pods"
  t.integer "parent_id"
  t.integer "pod_id"
  t.index ["parent_id", "pod_id"], name: "index_pod_pods_on_parent_id_and_pod_id", unique: true
  t.index ["parent_id"], name: "index_pod_pods_on_parent_id"
  t.index ["pod_id"], name: "index_pod_pods_on_pod_id"
Run Code Online (Sandbox Code Playgroud)

以下是我正在优化的特定功能:

def get_all_parents
    parents = []
    self.parents.active.each do |parent|
        parents << parent
        parents.concat(parent.get_all_parents)
    end
    return parents
end

def get_all_children
    children = []
    self.children.each do |child|
        children.concat(child.get_all_children)
    end
    return children
end

def get_all_parents_and_children
    pod_array = self.get_all_parents
    pod_array.concat(self.get_all_children)
    return pod_array
end

def get_all_relations(inclusive = false)
    circles_array = self.get_all_parents
    circles_array.each do |parent|
        circles_array = circles_array.concat(parent.get_all_children)
    end
    circles_array = circles_array.concat(self.get_all_children)
    unique_ids = circles_array.compact.map(&:id).uniq - [self.id]
    circles = Pod.where(id: unique_ids)
end
Run Code Online (Sandbox Code Playgroud)

据我研究,Postgres 支持一种递归 SQL 查询。我一直在使用这些文章来指明方向:12

这是我得到的:

def get_all_parents2
      sql =
        <<-SQL
            WITH RECURSIVE pod_tree(id, path) AS (
                SELECT id, ARRAY[id]
                FROM pods
                WHERE id = #{self.id}
            UNION ALL
                SELECT pods.id, path
                FROM pod_tree
                JOIN pods ON pods.id=pod_tree.id
                JOIN pod_pods ON pod_pods.parent_id = pods.id
                WHERE NOT pods.id = ANY(path)
            )
            SELECT * FROM pod_tree
            ORDER BY path;
        SQL
      sql.chomp
        Pod.find_by_sql(sql)
    end
Run Code Online (Sandbox Code Playgroud)

我的 SQL 不是特别好,我不知道如何向上和向下导航树结构,以便能够将我上面提到的函数重写为递归 SQL。如果您对此有所帮助,我将不胜感激。谢谢你。

ero*_*nin 5

您尝试完成的任务绝对可以通过递归 CTE 实现。我将介绍您拥有的前两个场景,因为其他两个场景只是前两个场景的扩展。

在所有 SQL 示例中,我将使用 id 1 来说明您在模型级别替换的值。由于您编写了该查询,因此我将假设您对递归 CTE 有所了解,并尝试寻找解决方案。

get_all_children

我们先来看看方法get_all_children。这种方法涉及沿着树向下走,一层一层地覆盖我们遇到的节点。

由于 pod_pods 包含有关层次结构的所有信息,并且在获取子级时不涉及范围,因此我们可以为子级递归 pod_pods。

-- Snippet #1
WITH RECURSIVE pod_tree AS (
  SELECT pod_id -- Get the pod_id of the children of the base case node
  FROM pod_pods
  WHERE parent_id = 1 -- Base case
  UNION ALL -- Recurse on this and do a union with the previous step
  SELECT p.pod_id
  FROM pod_pods p
    INNER JOIN pod_tree ptree 
      ON ptree.pod_id = p.parent_id -- Get the children nodes for nodes found at the previous recursion step.
)

SELECT * FROM pods 
WHERE id IN (SELECT DISTINCT(pod_id) FROM pod_tree);
Run Code Online (Sandbox Code Playgroud)

您的 Ruby 代码没有涵盖由于循环而发生无限循环的可能性,但如果有可能发生,您将解决此问题的方法是跟踪您已经看到的 id。

-- Snippet #2
WITH RECURSIVE pod_tree(pod_id, rtree) AS ( -- Extra rtree parameter to keep track of visited nodes
  SELECT pod_id, ARRAY[pod_id] -- Make the base case array with pod_id
  FROM pod_pods
  WHERE parent_id = 1 -- Base case
  UNION ALL
  SELECT p.pod_id, rtree || p.pod_id -- Add the current pod_id to array
  FROM pod_pods p
    INNER JOIN pod_tree ptree 
      ON ptree.pod_id = p.parent_id
  WHERE NOT (p.pod_id = ANY(rtree)) -- Exclude nodes which have already been seen  
)

SELECT * FROM pods 
WHERE id IN (SELECT DISTINCT(pod_id) FROM pod_tree);
Run Code Online (Sandbox Code Playgroud)

如果你可以在 pod_pods 中有孤儿关系并且想忽略它们,那么 pod 之间需要一个连接。

-- Snippet #3
WITH RECURSIVE pod_tree(id, rtree) AS (
  SELECT p1.id, ARRAY[p1.id]
  FROM pods p1 INNER JOIN pod_pods p2 ON p1.id = p2.pod_id 
  WHERE parent_id = 1
  UNION ALL
  SELECT p1.id, rtree || p1.id
  FROM pods p1 
    INNER JOIN pod_pods p2 ON p1.id = p2.pod_id
    INNER JOIN pod_tree ptree ON p2.parent_id = ptree.id
  WHERE NOT (p1.id = ANY(ptree.rtree))  
)

SELECT * FROM pods WHERE id IN (SELECT DISTINCT(id) FROM pod_tree);
Run Code Online (Sandbox Code Playgroud)

如果您没有孤立链接,我的建议是使用 Snippet #1 或 #2,因为它们比 #3 更快,因为它涉及额外的连接。

get_all_parents

首先,为了简单起见,让我们添加由于稍后激活而被添加的范围字段。首先,我们沿着 pod_pods 表的树向下走,获取所有父 ID,然后我们应用范围。

-- Snippet #4
WITH RECURSIVE pod_tree AS (
  SELECT parent_id -- Get the parent_id of the parents of the base case node
  FROM pod_pods
  WHERE pod_id = 1 -- Base case
  UNION ALL -- Recurse on this and do a union with the previous step
  SELECT p.parent_id
  FROM pod_pods p
    INNER JOIN pod_tree ptree 
      ON ptree.parent_id = p.pod_id -- Get the parent nodes for nodes found at the previous recursion step.
)

SELECT * FROM pods 
WHERE 
  id IN (SELECT DISTINCT(parent_id) FROM pod_tree)
  AND pod_state = 'active'
  AND pod_type IN ('standard', 'readonly')
;
Run Code Online (Sandbox Code Playgroud)

但是,这仅在获取所有节点后才应用活动过滤器。这可能并不理想,因为它可能会走比所需更多的树,甚至可能返回非活动节点的父节点。为了使它像 Ruby 代码中的方法一样,我们需要将它与 pod 连接起来。我在这里添加了无限递归避免步骤,并且您现在对此有所了解。

-- Snippet #5
WITH RECURSIVE pod_tree(id, rtree) AS (
  SELECT p1.id, ARRAY[p1.id]
  FROM pods p1 
    INNER JOIN pod_pods p2 ON p1.id = p2.parent_id 
  WHERE pod_id = 1
    AND p1.pod_state = 'active' 
    AND p1.pod_type IN ('standard', 'readonly')
  UNION ALL
  SELECT p1.id, rtree || p1.id
  FROM pods p1 
    INNER JOIN pod_pods p2 ON p1.id = p2.parent_id
    INNER JOIN pod_tree ptree ON p2.pod_id = ptree.id
  WHERE p1.pod_state = 'active' 
    AND p1.pod_type IN ('standard', 'readonly')
    AND NOT (p1.id = ANY(ptree.rtree))  
)

SELECT * FROM pods WHERE id IN (SELECT DISTINCT(id) FROM pod_tree);
Run Code Online (Sandbox Code Playgroud)

在基于您的存根方法的 Rails 中,代码段 #5 的代码将如下所示

def get_all_parents
  sql =
    <<-SQL
      WITH RECURSIVE pod_tree(id, rtree) AS (
        SELECT p1.id, ARRAY[p1.id]
        FROM pods p1 
          INNER JOIN pod_pods p2 ON p1.id = p2.parent_id 
        WHERE pod_id = #{self.id}
          AND p1.pod_state = 'active' 
          AND p1.pod_type IN ('standard', 'readonly')
        UNION ALL
        SELECT p1.id, rtree || p1.id
        FROM pods p1 
          INNER JOIN pod_pods p2 ON p1.id = p2.parent_id
          INNER JOIN pod_tree ptree ON p2.pod_id = ptree.id
        WHERE p1.pod_state = 'active' 
          AND p1.pod_type IN ('standard', 'readonly')
          AND NOT (p1.id = ANY(ptree.rtree))  
      )

      SELECT * FROM pods WHERE id IN (SELECT DISTINCT(id) FROM pod_tree);
    SQL
  # IMP!
  # sql = sql_sanitize(sql)
  # Add some sanitize step here
  sql.chomp
  Pod.find_by_sql(sql)
end
Run Code Online (Sandbox Code Playgroud)

这应该涵盖您的前两个用例。如前所述,另外两个是这两个的扩展,因此您可以使用这些扩展到那些。

笔记:

  • 如果你没有循环,你可以避免无限递归列,因为它是额外的簿记。
  • 如果您没有孤立链接,则更喜欢仅pod_pods针对儿童进行迭代,因为这样可以避免不必要的连接
  • rtree在上面的 sql 查询中包含层次结构。如果您需要该信息,您可以选择将其传回。我跳过了它,因为你无论如何最终都会使结果变平。
  • 我正在获取独特的节点。如果一个节点被多次访问,您的 Rails 代码当前将获取多次出现的节点。如果你想要这个,加上树的顺序,你可以有这样的行为:
-- Example for getting all parents
WITH RECURSIVE pod_tree(id, slug, pod_type, parent_id, rtree) AS (
  SELECT p1.id, p1.slug, p1.pod_type, p2.parent_id, ARRAY[p1.id] -- Select the fields you need
  FROM pods p1 INNER JOIN pod_pods p2 ON p1.id = p2.parent_id 
  WHERE pod_id = 1
  AND p1.pod_state = 'active' AND p1.pod_type IN ('standard', 'readonly')
  UNION ALL
  SELECT p1.id, p1.slug, p1.pod_type, p2.parent_id, rtree || p1.id
  FROM pods p1 INNER JOIN pod_pods p2 ON p1.id = p2.parent_id
  INNER JOIN pod_tree ptree ON p2.pod_id = ptree.id
  WHERE p1.pod_state = 'active' AND p1.pod_type IN ('standard', 'readonly')
  AND NOT (p1.id = ANY(ptree.rtree))  
)

SELECT * FROM pod_tree;

Run Code Online (Sandbox Code Playgroud)