如何计算Ruby数组中的重复元素

Žel*_*pin 67 ruby arrays

我有一个排序数组:

[
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="There is insufficient system memory to run this query.">'
]
Run Code Online (Sandbox Code Playgroud)

我想得到这样的东西,但它不一定是哈希:

[
  {:error => 'FATAL <error title="Request timed out.">', :count => 2},
  {:error => 'FATAL <error title="There is insufficient system memory to run this query.">', :count => 1}
]
Run Code Online (Sandbox Code Playgroud)

nim*_*odm 125

以下代码打印您要求的内容.我会让你决定如何实际用来生成你正在寻找的哈希:

# sample array
a=["aa","bb","cc","bb","bb","cc"]

# make the hash default to 0 so that += will work correctly
b = Hash.new(0)

# iterate over the array, counting duplicate entries
a.each do |v|
  b[v] += 1
end

b.each do |k, v|
  puts "#{k} appears #{v} times"
end
Run Code Online (Sandbox Code Playgroud)

注意:我刚注意到你说数组已经排序了.上面的代码不需要排序.使用该属性可能会产生更快的代码.

  • 如果你想找到最大值(并在一行中):a.inject(Hash.new(0)){| hash,val | hash [val] + = 1; hash} .entries.max_by {| entry | entry.last} ....得喜欢它! (4认同)
  • 我知道我迟到了,但是,哇.哈希默认值.这是一个非常酷的技巧.谢谢! (3认同)
  • 您应该学习[Enumerable](http://ruby-doc.org/core-1.9.3/Enumerable.html)以避免过程编码风格。 (2认同)

vla*_*adr 68

您可以使用以下方式非常简洁地(一行)执行此操作inject:

a = ['FATAL <error title="Request timed out.">',
      'FATAL <error title="Request timed out.">',
      'FATAL <error title="There is insufficient ...">']

b = a.inject(Hash.new(0)) {|h,i| h[i] += 1; h }

b.to_a.each {|error,count| puts "#{count}: #{error}" }
Run Code Online (Sandbox Code Playgroud)

会产生:

1: FATAL <error title="There is insufficient ...">
2: FATAL <error title="Request timed out.">
Run Code Online (Sandbox Code Playgroud)

  • 使用Ruby 1.9+,您可以使用[`each_with_object`](http://www.ruby-doc.org/core-1.9.3/Enumerable.html#method-i-each_with_object)而不是`inject`:`a. each_with_object(Hash.new(0)){| o,h | h [o] + = 1}`. (12认同)

Man*_*ava 29

如果您有这样的数组:

words = ["aa","bb","cc","bb","bb","cc"]
Run Code Online (Sandbox Code Playgroud)

在需要计算重复元素的地方,一行解决方案是:

result = words.each_with_object(Hash.new(0)) { |word,counts| counts[word] += 1 }
Run Code Online (Sandbox Code Playgroud)


Kao*_*oru 16

使用Enumerable#group_by对上述答案采用不同的方法.

[1, 2, 2, 3, 3, 3, 4].group_by(&:itself).map { |k,v| [k, v.count] }.to_h
# {1=>1, 2=>2, 3=>3, 4=>1}
Run Code Online (Sandbox Code Playgroud)

将其分解为不同的方法调用:

a = [1, 2, 2, 3, 3, 3, 4]
a = a.group_by(&:itself) # {1=>[1], 2=>[2, 2], 3=>[3, 3, 3], 4=>[4]}
a = a.map { |k,v| [k, v.count] } # [[1, 1], [2, 2], [3, 3], [4, 1]]
a = a.to_h # {1=>1, 2=>2, 3=>3, 4=>1}
Run Code Online (Sandbox Code Playgroud)

Enumerable#group_by 在Ruby 1.8.7中添加了.

  • 我喜欢`(&amp;:itself)`,这真是太聪明了! (2认同)

Car*_*ela 15

以下内容如何:

things = [1, 2, 2, 3, 3, 3, 4]
things.uniq.map{|t| [t,things.count(t)]}.to_h
Run Code Online (Sandbox Code Playgroud)

它有点干净,更能描述我们真正想要做的事情.

我怀疑它在大型集合中的表现也会比迭代每个值的表现更好.

基准性能测试:

a = (1...1000000).map { rand(100)}
                       user     system      total        real
inject                 7.670000   0.010000   7.680000 (  7.985289)
array count            0.040000   0.000000   0.040000 (  0.036650)
each_with_object       0.210000   0.000000   0.210000 (  0.214731)
group_by               0.220000   0.000000   0.220000 (  0.218581)
Run Code Online (Sandbox Code Playgroud)

所以速度要快得多.


San*_*osh 12

使用可枚举#tally

["a", "b", "c", "b"].tally 

#=> { "a" => 1, "b" => 2, "c" => 1 }
Run Code Online (Sandbox Code Playgroud)

注意:仅适用于 Ruby 版本 >= 2.7


dan*_*dan 8

就个人而言,我会这样做:

# myprogram.rb
a = ['FATAL <error title="Request timed out.">',
'FATAL <error title="Request timed out.">',
'FATAL <error title="There is insufficient system memory to run this query.">']
puts a
Run Code Online (Sandbox Code Playgroud)

然后运行程序并将其传递给uniq -c:

ruby myprogram.rb | uniq -c
Run Code Online (Sandbox Code Playgroud)

输出:

 2 FATAL <error title="Request timed out.">
 1 FATAL <error title="There is insufficient system memory to run this query.">
Run Code Online (Sandbox Code Playgroud)


Ana*_*mez 7

从Ruby> = 2.2开始,您可以使用itselfarray.group_by(&:itself).transform_values(&:count)

详细介绍:

array = [
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="There is insufficient system memory to run this query.">'
];

array.group_by(&:itself).transform_values(&:count)
 => { "FATAL <error title=\"Request timed out.\">"=>2,
      "FATAL <error title=\"There is insufficient system memory to run this query.\">"=>1 }
Run Code Online (Sandbox Code Playgroud)