如何用ruby以有效的方式获得单词频率?

Yos*_*sef 11 ruby regex

样本输入:

"I was 09809 home -- Yes! yes!  You was"
Run Code Online (Sandbox Code Playgroud)

并输出:

{ 'yes' => 2, 'was' => 2, 'i' => 1, 'home' => 1, 'you' => 1 }
Run Code Online (Sandbox Code Playgroud)

我的代码不起作用:

def get_words_f(myStr)
    myStr=myStr.downcase.scan(/\w/).to_s;
    h = Hash.new(0)
    myStr.split.each do |w|
       h[w] += 1 
    end
    return h.to_a;
end

print get_words_f('I was 09809 home -- Yes! yes!  You was');
Run Code Online (Sandbox Code Playgroud)

emr*_*azi 19

这可行,但我也是Ruby的新手.可能有更好的解决方案.

def count_words(string)
  words = string.split(' ')
  frequency = Hash.new(0)
  words.each { |word| frequency[word.downcase] += 1 }
  return frequency
end
Run Code Online (Sandbox Code Playgroud)

而不是.split(' '),你也可以做.scan(/\w+/); 然而,.scan(/\w+/)将分开arent"aren't",而.split(' ')不是.

输出示例代码:

print count_words('I was 09809 home -- Yes! yes!  You was');

#{"i"=>1, "was"=>2, "09809"=>1, "home"=>1, "yes"=>2, "you"=>1}
Run Code Online (Sandbox Code Playgroud)


meg*_*gas 7

def count_words(string)
  string.scan(/\w+/).reduce(Hash.new(0)){|res,w| res[w.downcase]+=1;res}
end
Run Code Online (Sandbox Code Playgroud)

第二种变体:

def count_words(string)
  string.scan(/\w+/).each_with_object(Hash.new(0)){|w,h| h[w.downcase]+=1}
end
Run Code Online (Sandbox Code Playgroud)


小智 6

def count_words(string)
  Hash[
    string.scan(/[a-zA-Z]+/)
      .group_by{|word| word.downcase}
      .map{|word, words|[word, words.size]}
  ]
 end

puts count_words 'I was 09809 home -- Yes! yes!  You was'
Run Code Online (Sandbox Code Playgroud)