使用自己的 Java 代码和模型获取 WEKA 中的预测百分比

Question

使用自己的 Java 代码和模型获取 WEKA 中的预测百分比

5 java classification machine-learning prediction weka

概述

我知道可以通过 GUI 和命令行选项获得训练好的 WEKA 模型中每个预测的百分比，如文档文章“进行预测”中所述和演示的那样。

我想要我的 WEKA OOOHH *LADY GAGA PIANO*

预测

我知道有记录的三种方式来获得这些预测：

命令行
图形用户界面
Java代码/使用WEKA API，这是我能在答案做“使用自己的Java代码获取WEKA风险预测”
这第四个需要生成的 WEKA.MODEL文件

我有一个经过训练的.MODEL文件，现在我想使用它对新实例进行分类，并结合类似于下面的预测百分比（GUI 资源管理器的输出，CSV格式）：

inst#,actual,predicted,error,distribution,
1,1:0,2:1,+,0.399409,*0.7811
2,1:0,2:1,+,0.3932409,*0.8191
3,1:0,2:1,+,0.399409,*0.600591
4,1:0,2:1,+,0.139409,*0.64
5,1:0,2:1,+,0.399409,*0.600593
6,1:0,2:1,+,0.3993209,*0.600594
7,1:0,2:1,+,0.500129,*0.600594
8,1:0,2:1,+,0.399409,*0.90011
9,1:0,2:1,+,0.211409,*0.60182
10,1:0,2:1,+,0.21909,*0.11101

Run Code Online (Sandbox Code Playgroud)

该predicted列是我想从.MODEL文件中获取的内容。

我知道的

根据我对 WEKA API 方法的经验，可以使用以下代码（PlainText插入到Evaluation对象中）获得这些预测，但我不想进行对象提供的k折交叉验证Evaluation。

StringBuffer predictionSB = new StringBuffer();
Range attributesToShow = null;
Boolean outputDistributions = new Boolean(true);

PlainText predictionOutput = new PlainText();
predictionOutput.setBuffer(predictionSB);
predictionOutput.setOutputDistribution(true);

Evaluation evaluation = new Evaluation(data);
evaluation.crossValidateModel(j48Model, data, numberOfFolds,
        randomNumber, predictionOutput, attributesToShow,
        outputDistributions);

System.out.println(predictionOutput.getBuffer());

Run Code Online (Sandbox Code Playgroud)

来自 WEKA 文档

请注意，.MODEL文件对来自.ARFF或相关输入的数据进行分类在“在 Java 代码中使用 Weka”和“序列化”又名“如何.MODEL在自己的 Java 代码中使用文件对新实例进行分类”中讨论过（为什么标题含糊不清） .

使用自己的Java代码进行分类

加载.MODEL文件是通过“反序列化”进行的，以下适用于版本 > 3.5.5：

// deserialize model
Classifier cls = (Classifier) weka.core.SerializationHelper.read("/some/where/j48.model");

Run Code Online (Sandbox Code Playgroud)

一个Instance目的是数据并将其馈送到classifyInstance。此处提供输出（取决于结果属性的数据类型）：

// classify an Instance object (testData)
cls.classifyInstance(testData.instance(0));

Run Code Online (Sandbox Code Playgroud)

问题“如何在 eclipse java 中重用从资源管理器（在 weka 中）创建的已保存分类器”也有很好的答案！

文档

我已经检查了Classifier（训练模型）和Evaluation（以防万一）的 Javadocs，但没有直接明确地解决这个问题。

唯一最接近我想要的是的classifyInstances方法Classifier：

对给定的测试实例进行分类。实例在分类时必须属于数据集。请注意，分类器必须实现 this 或 distributionForInstance()。

如何.MODEL使用我自己的 Java 代码（也就是使用 WEKA API）同时使用 WEKA文件对新实例进行分类和预测？

我想要我的 WEKA OOOHH *LADY GAGA PIANO*

Answer 1

Wal*_*ter 3

这个答案只是更新了我的答案：如何重用在 eclipse java 中从资源管理器（在 weka 中）创建的保存的分类器。

我将展示如何获取预测实例值和预测百分比（或分布）。示例模型是在 Weka Explorer 中创建并保存的 J48 决策树。它是根据 Weka 提供的名义天气数据构建的。它被称为“tree.model”。

import weka.classifiers.Classifier;
import weka.core.Instances;

public class Main {

    public static void main(String[] args) throws Exception
    {
        String rootPath="/some/where/"; 
        Instances originalTrain= //instances here

        //load model
        Classifier cls = (Classifier) weka.core.SerializationHelper.read(rootPath+"tree.model");

        //predict instance class values
        Instances originalTrain= //load or create Instances to predict

        //which instance to predict class value
        int s1=0;

        //perform your prediction
        double value=cls.classifyInstance(originalTrain.instance(s1));

        //get the prediction percentage or distribution
        double[] percentage=cls.distributionForInstance(originalTrain.instance(s1));

        //get the name of the class value
        String prediction=originalTrain.classAttribute().value((int)value); 

        System.out.println("The predicted value of instance "+
                                Integer.toString(s1)+
                                ": "+prediction); 

        //Format the distribution
        String distribution="";
        for(int i=0; i <percentage.length; i=i+1)
        {
            if(i==value)
            {
                distribution=distribution+"*"+Double.toString(percentage[i])+",";
            }
            else
            {
                distribution=distribution+Double.toString(percentage[i])+",";
            }
        }
        distribution=distribution.substring(0, distribution.length()-1);

        System.out.println("Distribution:"+ distribution);
    }

}

Run Code Online (Sandbox Code Playgroud)

其输出是：

The predicted value of instance 0: no  
Distribution: *1, 0

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，11 月前
查看次数：	7420 次
最近记录：	6 年，3 月前