如何解释Weka Logistic回归输出?

Ant*_*nin 11 weka logistic-regression

请帮助解释weka库中weka.classifiers.functions.Logistic产生的逻辑回归结果.

我使用来自Weka示例的数值数据:

@relation weather

@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}

@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
Run Code Online (Sandbox Code Playgroud)

要创建逻辑回归模型,我使用命令:java -cp $ WEKA_INS/weka.jar weka.classifiers.functions.Logistic -t $ WEKA_INS/data/weather.numeric.arff -T $ WEKA_INS/data/weather.numeric.arff - d ./weather.numeric.model.arff

这三个论点意味着:

-t <name of training file> : Sets training file.
-T <name of test file> : Sets test file. 
-d <name of output file> : Sets model output file.
Run Code Online (Sandbox Code Playgroud)

运行以上命令会产生以下输出:

Logistic Regression with ridge parameter of 1.0E-8
Coefficients...
              Class
Variable                    yes
===============================
outlook=sunny           -6.4257
outlook=overcast        13.5922
outlook=rainy           -5.6562
temperature             -0.0776
humidity                -0.1556
windy                    3.7317
Intercept                22.234

Odds Ratios...
              Class
Variable                    yes
===============================
outlook=sunny            0.0016
outlook=overcast    799848.4264
outlook=rainy            0.0035
temperature              0.9254
humidity                 0.8559
windy                   41.7508


Time taken to build model: 0.05 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===
Correctly Classified Instances          11               78.5714 %
Incorrectly Classified Instances         3               21.4286 %
Kappa statistic                          0.5532
Mean absolute error                      0.2066
Root mean squared error                  0.3273
Relative absolute error                 44.4963 %
Root relative squared error             68.2597 %
Total Number of Instances               14     

=== Confusion Matrix ===
 a b   <-- classified as
 7 2 | a = yes
 1 4 | b = no
Run Code Online (Sandbox Code Playgroud)

问题:

1)报告的第一部分:

Coefficients...
              Class
Variable                    yes
===============================
outlook=sunny           -6.4257
outlook=overcast        13.5922
outlook=rainy           -5.6562
temperature             -0.0776
humidity                -0.1556
windy                    3.7317
Intercept                22.234
Run Code Online (Sandbox Code Playgroud)

1.1)我是否理解"系数"实际上是在将每个属性加在一起以产生类属性"play"的值等于"是"之前应用于每个属性的权重?

2)报告的第二部分:

Odds Ratios...
              Class
Variable                    yes
===============================
outlook=sunny            0.0016
outlook=overcast    799848.4264
outlook=rainy            0.0035
temperature              0.9254
humidity                 0.8559
windy                   41.7508
Run Code Online (Sandbox Code Playgroud)

2.1)"赔率比"是什么意思?2.2)它们是否也与类属性"play"等于"yes"有关?2.3)为什么"outlook = overcast"的价值比"outlook = sunny"的价值大得多?

3)

=== Confusion Matrix ===
 a b   <-- classified as
 7 2 | a = yes
 1 4 | b = no
Run Code Online (Sandbox Code Playgroud)

3.1)混淆矩阵的内容是什么?

非常感谢你的帮助!

Wal*_*ter 12

题:

  1. 从以下评论更新:系数实际上是应用于每个属性的权重,其插入逻辑函数1 /(1 + exp(-weighted_sum))以获得概率.请注意,在将它们加在一起之前,"截距"值将添加到总和中,而不会乘以任何变量. 结果是新实例属于类yes的概率(> 0.5表示是).

  2. 优势比表明该值的变化(或该值的变化)对预测的影响有多大.我认为这个链接可以很好地解释比值比.outlook = overcast的价值是如此之大,因为如果前景是阴天,那么该游戏将等于是非常好.

  3. 混淆矩阵只是简单地显示了有多少测试数据点被正确和错误地分类.在你的例子中,7 A实际上被归类为A,而2 A被错误分类为B.你的问题在这个问题中得到了更全面的回答:如何阅读WEKA中的分类器混淆矩阵.

  • 1.严格不正确:将结果(加权重)插入逻辑函数`1 /(1 + exp(-weighted_sum))`以获得概率.请注意,"截距"值将添加到总和中,而不会乘以任何变量. (4认同)