在特定键上查找哈希数组中的重复项

lyo*_*eta 4 ruby csv arrays hash duplicates

我有一个哈希数组(实际上是CSV行),我需要查找并保留与两个特定键(用户,部分)匹配的所有行.以下是数据示例:

[
  { user: 1, role: "staff", section: 123 },
  { user: 2, role: "staff", section: 456 },
  { user: 3, role: "staff", section: 123 },
  { user: 1, role: "exec", section: 123 },
  { user: 2, role: "exec", section: 456 },
  { user: 3, role: "staff", section: 789 }
]
Run Code Online (Sandbox Code Playgroud)

所以我需要返回的是一个数组,其中只包含相同用户/节组合出现多次的行,如下所示:

[
  { user: 1, role: "staff", section: 123 },
  { user: 1, role: "exec", section: 123 },
  { user: 2, role: "staff", section: 456 },
  { user: 2, role: "exec", section: 456 }
]
Run Code Online (Sandbox Code Playgroud)

我正在尝试的双循环解决方案如下所示:

enrollments.each_with_index do |a, ai|
  enrollments.each_with_index do |b, bi|
    next if ai == bi

    duplicates << b if a[2] == b[2] && a[6] == b[6]
  end
end
Run Code Online (Sandbox Code Playgroud)

但由于CSV是145K行,因此它将永远消失.

如何才能更有效地获得我需要的输出?

Ali*_*eza 8

在效率方面,您可能想尝试这样做:

grouped = csv_arr.group_by{|row| [row[:user],row[:section]]}
filtered = grouped.values.select { |a| a.size > 1 }.flatten
Run Code Online (Sandbox Code Playgroud)

第一个语句按:user:section键对记录进行分组.结果是:

{[1, 123]=>[{:user=>1, :role=>"staff", :section=>123}, {:user=>1, :role=>"exec", :section=>123}],
 [2, 456]=>[{:user=>2, :role=>"staff", :section=>456}, {:user=>2, :role=>"exec", :section=>456}],
 [3, 123]=>[{:user=>3, :role=>"staff", :section=>123}],
 [3, 789]=>[{:user=>3, :role=>"staff", :section=>789}]}
Run Code Online (Sandbox Code Playgroud)

第二个语句只选择具有多个成员的组的值,然后展平结果以便为您提供:

[{:user=>1, :role=>"staff", :section=>123},
 {:user=>1, :role=>"exec", :section=>123},
 {:user=>2, :role=>"staff", :section=>456},
 {:user=>2, :role=>"exec", :section=>456}]
Run Code Online (Sandbox Code Playgroud)

这可以提高你的操作速度,但内存方面我不能说大输入会产生什么影响,因为它取决于你的机器,资源和文件的大小