Ruby和Ruby on Rails中的Memoization和缓存

equ*_*nt8 4 ruby activerecord ruby-on-rails

Given application is looping through many fields Why is Application making multiple SQL calls even if I memoize the object

要么

Given application is looping through many items How to prevent application doing expensive calculation on every item

示例Rails代码

  • 工作有很多评论
  • 只有在没有评论或管理员用户时才能删除工作
  • 只有可以删除时,我们的视图界面才会显示"删除工作"

注意:我们使用http://www.eq8.eu/blogs/41-policy-objects-in-ruby-on-rails中描述的策略视图对象

class WorksController < ApplicationController
  def index
    @works = Work.all
  end
end

<% @works.each do |work| %>
   <%= link_to("Delete work", work, method: delete) if work.policy.able_to_delete?(current_user: current_user) %>
<% end %>

class Work < ActiveRecord::Base
  has_many :comments

  def policy
     @policy ||= WorkPolicy.new
  end
end

class Comment
  belongs_to :work
end

class WorkPolicy
  attr_reader :work

  def initialize(work)
    @work = work
  end

  def able_to_delete?(current_user: nil)
    work_has_no_comments || (current_user && current_user.admin?)
  end

  private

  def work_has_no_comments
    work.comments.count < 1
  end
end
Run Code Online (Sandbox Code Playgroud)

现在假设我们在DB中有100个Works

这将导致多个SQL调用:

SELECT "works".* FROM "works"
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 1]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 2]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 3]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 4]
Run Code Online (Sandbox Code Playgroud)

注意:最近我向同事解释了这个例子,我认为值得为更多开发人员记录

equ*_*nt8 14

记忆化

首先让我们回答一下

为什么即使我记住对象, Application 也会进行多次SQL调用

是的,我们正在记住Policy对象 @policy ||= WorkPolicy.new

但我们并没有记住那个对象正在调用的内容.这意味着我们需要记住底层对象方法调用结果.

所以如果我们这样做:

@work = Work.last
@work.policy.able_to_delete?
#=> SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 100] # sql call 
@work.policy.able_to_delete?
#=> SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 100] # sql call 
@work.policy.able_to_delete?
#=> SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 100] # sql call 
Run Code Online (Sandbox Code Playgroud)

......我们会打多次电话 comments.count

但是如果我们引入另一层memoization:

所以让我们改变这个:

class WorkPolicy
  # ...

  def work_has_no_comments
    work.comments.count < 1
  end
end
Run Code Online (Sandbox Code Playgroud)

对此:

class WorkPolicy
  # ...

  def work_has_no_comments
    @work_has_no_comments ||= comments.count < 1
  end
end


@work = Work.last
@work.policy.able_to_delete?
#=> SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 100] # sql call 
@work.policy.able_to_delete?
@work.policy.able_to_delete?
Run Code Online (Sandbox Code Playgroud)

正如您所看到的,仅在第一次对count进行SQL调用,然后从对象状态的内存返回结果.

高速缓存

但是我们的"循环多个工作"的情况这不起作用,因为我们用100个WorkPolicy对象初始化100个Work对象

了解它的最佳方法是在您的代码中运行此代码irb:

class Foo
  def x
    @x ||= calculate
  end

  private

  def calculate
      sleep 2 # slow query
      123
  end
end

class Bar
  def y
    @y ||= Foo.new
  end
end

p "10 times calling same memoized object\n"
bar = Bar.new
10.times do
  puts  bar.y.x
end

p "10 times initializing new object\n"

10.times do
  bar = Bar.new
  puts  bar.y.x
end
Run Code Online (Sandbox Code Playgroud)

解决此问题的一种方法是使用Rails缓存

class WorkPolicy
  # ...

  def work_has_no_comments
    Rails.cache.fetch [WorkPolicy, 'work_has_no_comments', @work] do
      work.comments.count < 1
    end
  end
end

class Comment
  belongs_to :work, touch: true    # `touch: true` will update the Work#updated_at each time new commend is added/changed, so that we drop the cache 
end
Run Code Online (Sandbox Code Playgroud)

现在这只是一个愚蠢的例子.我知道这可能是通过引入on Work#comments_count方法缓存它并缓存那里的注释计数.我只是想展示一些选择.

有了这样的缓存,我们第一次运行时WorksController#index会得到多个SQL调用:

SELECT "works".* FROM "works"
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 1]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 2]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 3]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 4]
# ...
Run Code Online (Sandbox Code Playgroud)

...但第二,第三,电话看起来像:

SELECT "works".* FROM "works"
# no count call
Run Code Online (Sandbox Code Playgroud)

如果您向带有ID的工作添加新评论3:

SELECT "works".* FROM "works"
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 3]
Run Code Online (Sandbox Code Playgroud)

适当的SQL

现在我们仍然不满意.我们希望第一次跑得快!问题是我们如何称呼我们的协会(评论).我们懒惰加载它们:

Work.limit(3).each {|w| w.comments }

# => SELECT  "works".* FROM "works" WHERE  ORDER BY "works"."id" DESC LIMIT 10
# => SELECT "comments".* FROM "comments" WHERE "comments"."work_id" = $1  ORDER BY comments.created_at ASC  [["work_id", 97]]
# => SELECT "comments".* FROM "comments" WHERE "comments"."work_id" = $1  ORDER BY comments.created_at ASC  [["work_id", 98]]
# => SELECT "comments".* FROM "comments" WHERE "comments"."work_id" = $1  ORDER BY comments.created_at ASC  [["work_id", 99]]
Run Code Online (Sandbox Code Playgroud)

但是如果我们急于加载它们:

  Work.limit(3).includes(:comments).map(&:comments)

  SELECT  "works".* FROM "works" WHERE "works"."deleted_at" IS NULL LIMIT 3
  SELECT "comments".* FROM "comments" WHERE "comments"."status" = 'approved' AND "comments"."work_id" IN (97, 98, 99)  ORDER BY comments.created_at ASC
Run Code Online (Sandbox Code Playgroud)

了解更多关于includes,joinshttp://blog.scoutapp.com/articles/2017/01/24/activerecord-includes-vs-joins-vs-preload-vs-eager_load-when-and-where

所以我们的代码可能是:

class WorksController < ApplicationController
  def index
    @works = Work.all.includes(:comments)
  end
end

class WorkPolicy
  # ...

  def work_has_no_comments
    work.comments.size < 1        # we changed `count` to `size`
  end
end
Run Code Online (Sandbox Code Playgroud)

问:现在等一下,是不是comments.countcommets.size一样?

并不是的

10.times do
  work.comments.size
end  
# SELECT "comments".* FROM "comments" WHERE "comments"."work_id" = $1    ORDER BY comments.created_at ASC  [["work_id", 1]]
Run Code Online (Sandbox Code Playgroud)

...将所有注释加载到(类似于)Array并进行大小的数组计算(如同[] .size)

10.times do
  work.comments.count
end
# SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 1]]
# SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 1]]
# SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 1]]
# ...
Run Code Online (Sandbox Code Playgroud)

...执行 SELECT COUNT比加载"所有注释"以计算大小要快得多,但是当你需要执行10次时,你明确地进行了10次调用

现在我对work.comments.sizeRails 过度研究更加聪明,决定你是否只想要size.在某些情况下,它只执行SELECT COUNT(*)而不是"将所有注释加载到数组"并执行[] .size

.pluck与vs 相似.map

scope = Work.limit(10)
scope.pluck(:title)
# SELECT  "works"."title" FROM "works" LIMIT 10
# => ['foo', 'bar', ...]
scope.pluck(:title)
# SELECT  "works"."title" FROM "works" LIMIT 10
# => ['foo', 'bar', ...]

scope.map(&:title)
# SELECT  "works".* FROM "works" LIMIT 10
# => ['foo', 'bar', ...]
scope.map(&:title)
# => ['foo', 'bar', ...]
Run Code Online (Sandbox Code Playgroud)
  • pluck更快,因为它只选择titleto数组,但每次都执行SQL调用
  • map将导致Rails评估SELECT *以填充title到数组,但随后您可以使用加载的对象

结论

没有银弹.它总是取决于你想要实现的目标.

有人可能会说"优化SQL"解决方案效果最好,但事实并非如此.您需要在您调用的每个位置实现类似的SQL优化work.policy.able_to_delete,可能是10或100个位置.includes在表现方面可能并不总是好主意.

缓存可以根据什么事件应该删除缓存的哪个部分来获得超级链接.如果您没有正确地做到这一点,您的网站可能会显示"过时信息"!如果策略对象超级危险.

Memoization并不总是足够灵活,因为您可能需要重新设计大部分代码库来实现它并引入太多不必要的抽象层

更不用说在Rubinius这样的线程安全环境中,memoization是很大的No No,除非你正确地同步你的线程.如果你使用MRI,Rails和Puma是线程安全的,那么不要担心你的记忆是好的(在95%的情况下),但这是不同类型的"线程安全".你真的需要做一些难以解决的问题.这篇文章太长了,无法进入该主题.谷歌一下!

真的取决于您的应用程序(应用程序的一部分)的目标.我唯一的建议是:个人资料/基准你的应用!不要过早优化.使用New relic等工具来发现应用程序的哪些部分很慢.

逐步优化,不要构建缓慢的应用程序,然后在一个sprint中,您将决定"正确,让我们优化",因为您可能会发现您做出了糟糕的设计选择,并且50%的App需要重写更快.

未提及的其他解决方案

计数器缓存

数据库索引

可能会发出主题但很多性能问题都会发生,因为您的应用没有数据库索引(或过多的过早索引)