Is COMB GUID a good idea with Rails 3.1 if I use GUIDs for primary keys?

Mar*_*rry 4 uuid activerecord guid ruby-on-rails ruby-on-rails-3.1

I'm using Rails 3.1 with PostgreSQL 8.4. Let's assume I want/need to use GUID primary keys. One potential drawback is index fragmentation. In MS SQL, a recommended solution for that is to use special sequential GUIDs. One approach to sequential GUIDs is the COMBination GUID that substitutes a 6-byte timestamp for the MAC address portion at the end of the GUID. This has some mainstream adoption: COMBs are available natively in NHibernate (NHibernate/Id/GuidCombGenerator.cs).

I think I've figured out how to create COMB GUIDs in Rails (with the help of the UUIDTools 2.1.2 gem), but it leaves some unanswered questions:

  • Does PostgreSQL suffer from index fragmentation when the PRIMARY KEY is type UUID?
  • Is fragmentation avoided if the low-order 6 bytes of the GUID are sequential?
  • Is the COMB GUID as implemented below an acceptable, reliable way to create sequential GUIDs in Rails?

Thanks for your thoughts.


create_contacts.rb migration

class CreateContacts < ActiveRecord::Migration
  def up
    create_table :contacts, :id => false do |t|
      t.column :id, :uuid, :null => false # manually create :id with underlying DB type UUID
      t.string :first_name
      t.string :last_name
      t.string :email

      t.timestamps
    end
    execute "ALTER TABLE contacts ADD PRIMARY KEY (id);"
  end

    # Can't use reversible migration because it will try to run 'execute' again
  def down
    drop_table :contacts # also drops primary key
  end
end
Run Code Online (Sandbox Code Playgroud)

/app/models/contact.rb

class Contact < ActiveRecord::Base
  require 'uuid_helper' #rails 3 does not autoload from lib/*
  include UUIDHelper

  set_primary_key :id
end
Run Code Online (Sandbox Code Playgroud)

/lib/uuid_tools.rb

require 'uuidtools'

module UUIDHelper
  def self.included(base)
    base.class_eval do
      include InstanceMethods
      attr_readonly :id       # writable only on a new record
      before_create :set_uuid
    end
  end

  module InstanceMethods
  private
    def set_uuid
      # MS SQL syntax:  CAST(CAST(NEWID() AS BINARY(10)) + CAST(GETDATE() AS BINARY(6)) AS UNIQUEIDENTIFIER)

      # Get current Time object
      utc_timestamp = Time.now.utc

      # Convert to integer with milliseconds:  (Seconds since Epoch * 1000) + (6-digit microsecond fraction / 1000)
      utc_timestamp_with_ms_int = (utc_timestamp.tv_sec * 1000) + (utc_timestamp.tv_usec / 1000)

      # Format as hex, minimum of 12 digits, with leading zero.  Note that 12 hex digits handles to year 10889 (*).
      utc_timestamp_with_ms_hexstring = "%012x" % utc_timestamp_with_ms_int

      # If we supply UUIDTOOLS with a MAC address, it will use that rather than retrieving from system.
      # Use a regular expression to split into array, then insert ":" characters so it "looks" like a MAC address.
      UUIDTools::UUID.mac_address = (utc_timestamp_with_ms_hexstring.scan /.{2}/).join(":")

      # Generate Version 1 UUID (see RFC 4122).
      comb_guid = UUIDTools::UUID.timestamp_create().to_s 

      # Assign generted COMBination GUID to .id
      self.id = comb_guid

      # (*) A note on maximum time handled by 6-byte timestamp that includes milliseconds:
      # If utc_timestamp_with_ms_hexstring = "FFFFFFFFFFFF" (12 F's), then 
      # Time.at(Float(utc_timestamp_with_ms_hexstring.hex)/1000).utc.iso8601(10) = "10889-08-02T05:31:50.6550292968Z".
    end
  end
end
Run Code Online (Sandbox Code Playgroud)

小智 5

  • 当 PRIMARY KEY 类型为 UUID 时,PostgreSQL 是否会出现索引碎片?

是的,这是可以预料的。但是,如果您打算使用 COMB 策略,则不会发生。行将始终按顺序排列(这并不完全正确,但请耐心等待)。

此外,本机 pgsql UUID 与 VARCHAR 之间的性能也没有什么不同。要考虑的另一点。

  • 如果 GUID 的低 6 个字节是连续的,是否可以避免碎片?

在我的测试中,我发现 UUID1(RFC 4122) 是连续的,在生成的 uuid 中已经添加了一个时间戳。但是是的,在最后 6 个字节中添加时间戳将确保排序。无论如何,这就是我所做的,因为显然已经存在的时间戳并不能保证顺序。更多关于 COMB在这里

  • 下面实现的 COMB GUID 是否是在 Rails 中创建顺序 GUID 的可接受、可靠的方法?

我没有使用 rails,但我会向你展示我是如何在 django 中做到的:

import uuid, time

def uuid1_comb(obj):
    return uuid.uuid1(node=int(time.time() * 1000))
Run Code Online (Sandbox Code Playgroud)

其中node是标识硬件地址的 48 位正整数。

关于您的实现,使用 uuid 的主要优点之一是您可以在数据库之外安全地生成它们,因此,使用帮助类是一种有效的方法。您始终可以使用外部服务来生成 uuid ,例如snowflake,但此时可能是过早的优化。