uniq for Enumerator :: Lazy

mic*_*lpm 3 ruby

我正在处理有很多重复行的东西:

# => [ [1, "A", 23626], [1, "A", 31314], [2, "B", 2143], [2, "B", 5247] ]
puts xs

# => [ [1, "A"], [2, "B"] ]
puts xs.uniq{ |x| x[0] }.map{ |x| [x[0], x[1]] }
Run Code Online (Sandbox Code Playgroud)

但是xs很大.我试图懒洋洋地加载它,但Enumerator #Lazy没有uniq方法.

我该如何懒惰地实现这一目标?

Ama*_*dan 7

module EnumeratorLazyUniq
  refine Enumerator::Lazy do
    require 'set'
    def uniq
      set = Set.new
      select { |e|
        val = block_given? ? yield(e) : e
        !set.include?(val).tap { |exists|
          set << val unless exists
        }
      }
    end
  end
end

using EnumeratorLazyUniq
xs = [ [1, "A", 23626], [1, "A", 31314], [2, "B", 2143], [2, "B", 5247] ].to_enum.lazy

us = xs.uniq{ |x| x[0] }.map{ |x| [x[0], x[1]] }
puts us.to_a.inspect
# => [[1, "A"], [2, "B"]]
# Works with a block

puts us.class
# => Enumerator::Lazy
# Yep, still lazy.

ns = [1, 4, 6, 1, 2].to_enum.lazy
puts ns.uniq.to_a.inspect
# => [1, 4, 6, 2]
# Works without a block
Run Code Online (Sandbox Code Playgroud)

这是直接实现使用Set; 这意味着任何uniq'd值(例如[1, "A"],但不是流元素本身等[1, "A", 23626])会占用内存.