我如何在Ruby中做标准偏差?

Tim*_* T. 54 ruby standard-deviation

我有几个具有给定属性的记录,我想找到标准偏差.

我怎么做?

tol*_*ius 82

module Enumerable

    def sum
      self.inject(0){|accum, i| accum + i }
    end

    def mean
      self.sum/self.length.to_f
    end

    def sample_variance
      m = self.mean
      sum = self.inject(0){|accum, i| accum +(i-m)**2 }
      sum/(self.length - 1).to_f
    end

    def standard_deviation
      Math.sqrt(self.sample_variance)
    end

end 
Run Code Online (Sandbox Code Playgroud)

测试它:

a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ]
a.standard_deviation  
# => 4.594682917363407
Run Code Online (Sandbox Code Playgroud)

2012年1月17日:

由Dave Sag修复"sample_variance"

  • 你不需要在Ruby中写"self"或"return"这么多. (17认同)
  • 在行中`sum /(self.length - 1).to_f`为什么你从Enumerable的长度中减去1? (3认同)
  • @ moger777代码正在进行样本标准偏差,而不是总体标准差,因此(n-1)是正确的:http://www.macroption.com/population-sample-variance-standard-deviation/ (3认同)
  • 你的`sample_variance`方法有一个错误.请参阅下面的答案. (2认同)
  • 我认为sum /(self.length - 1).to_f应该是sum/length,我不认为-1是必要的并且会导致问题. (2认同)

epr*_*hro 34

安吉拉似乎一直想要一个现有的图书馆.在使用statsample,array-statisics和其他一些内容后,如果你试图避免重新发明轮子,我会推荐使用descriptive_statistics gem.

gem install descriptive_statistics
Run Code Online (Sandbox Code Playgroud)
$ irb
1.9.2 :001 > require 'descriptive_statistics'
 => true 
1.9.2 :002 > samples = [1, 2, 2.2, 2.3, 4, 5]
 => [1, 2, 2.2, 2.3, 4, 5] 
1.9.2p290 :003 > samples.sum
 => 16.5 
1.9.2 :004 > samples.mean
 => 2.75 
1.9.2 :005 > samples.variance
 => 1.7924999999999998 
1.9.2 :006 > samples.standard_deviation
 => 1.3388427838995882 
Run Code Online (Sandbox Code Playgroud)

我不能说它的统计正确性,或者你对猴子修补的安慰可数; 但它易于使用且易于贡献.

  • Rails用户的重要说明.这时,descriptive_statistics gem似乎打破了ActiveRecord :: Relation - 你会遇到`NoMethodError:undefined method`no?' 对于nil:NilClass`和`(对象不支持#inspect)`. (3认同)

Dav*_*Sag 30

上面给出的答案很优雅但是有一点误差.我不是自己的统计数据,我坐下来仔细阅读了一些网站,发现这个网站给出了如何得出标准差的最易理解的解释.http://sonia.hubpages.com/hub/stddev

上面答案中的错误在于sample_variance方法中.

这是我的更正版本,以及一个简单的单元测试,显示它的工作原理.

./lib/enumerable/standard_deviation.rb

#!usr/bin/ruby

module Enumerable

  def sum
    return self.inject(0){|accum, i| accum + i }
  end

  def mean
    return self.sum / self.length.to_f
  end

  def sample_variance
    m = self.mean
    sum = self.inject(0){|accum, i| accum + (i - m) ** 2 }
    return sum / (self.length - 1).to_f
  end

  def standard_deviation
    return Math.sqrt(self.sample_variance)
  end

end
Run Code Online (Sandbox Code Playgroud)

./test使用从简单的电子表格衍生号码.

带有示例数据的Numbers电子表格的屏幕快照

#!usr/bin/ruby

require 'enumerable/standard_deviation'

class StandardDeviationTest < Test::Unit::TestCase

  THE_NUMBERS = [1, 2, 2.2, 2.3, 4, 5]

  def test_sum
    expected = 16.5
    result = THE_NUMBERS.sum
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_mean
    expected = 2.75
    result = THE_NUMBERS.mean
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_sample_variance
    expected = 2.151
    result = THE_NUMBERS.sample_variance
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_standard_deviation
    expected = 1.4666287874
    result = THE_NUMBERS.standard_deviation
    assert result.round(10) == expected, "expected #{expected} but got #{result}"
  end

end
Run Code Online (Sandbox Code Playgroud)


mar*_*cgg 9

我不是添加方法的忠实粉丝,Enumerable因为可能会产生不必要的副作用.它还为任何继承的类提供了一个特定于数组的方法Enumerable,这在大多数情况下都没有意义.

虽然这对于测试,脚本或小应用程序来说很好,但对于大型应用程序来说风险很大,所以这里有一个基于@tolitius答案的替代方案,它已经很完美了.这比其他任何东西都更适合参考:

module MyApp::Maths
  def self.sum(a)
    a.inject(0){ |accum, i| accum + i }
  end

  def self.mean(a)
    sum(a) / a.length.to_f
  end

  def self.sample_variance(a)
    m = mean(a)
    sum = a.inject(0){ |accum, i| accum + (i - m) ** 2 }
    sum / (a.length - 1).to_f
  end

  def self.standard_deviation(a)
    Math.sqrt(sample_variance(a))
  end
end
Run Code Online (Sandbox Code Playgroud)

然后你就这样使用它:

2.0.0p353 > MyApp::Maths.standard_deviation([1,2,3,4,5])
=> 1.5811388300841898

2.0.0p353 :007 > a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ]
 => [20, 23, 23, 24, 25, 22, 12, 21, 29]

2.0.0p353 :008 > MyApp::Maths.standard_deviation(a)
 => 4.594682917363407

2.0.0p353 :043 > MyApp::Maths.standard_deviation([1,2,2.2,2.3,4,5])
 => 1.466628787389638
Run Code Online (Sandbox Code Playgroud)

行为是相同的,但它避免了添加方法的开销和风险Enumerable.