use*_*263 6 machine-learning weka
我想知道WEKA是否有办法为分类输出一些"最佳猜测".
我的场景是:例如,我使用交叉验证对数据进行分类,然后在weka的输出上得到类似的结果:这些是对此实例进行分类的3个最佳猜测.我想要的是,即使一个实例没有被正确分类,我得到该实例的3或5个最佳猜测的输出.
例:
课程:A,B,C,D,E实例:1 ... 10
输出将是:实例1 90%可能是A类,75%可能是B类,60%可能是C类..
谢谢.
Weka的API有一个名为Classifier.distributionForInstance()的方法可用于获取分类预测分布.然后,您可以通过降低概率来对分布进行排序,以获得前N个预测.
下面是打印出的函数:(1)测试实例的地面实况标签; (2)来自classifyInstance()的预测标签; (3)来自distributionForInstance()的预测分布.我已经将它用于J48,但它应该与其他分类器一起使用.
输入参数是序列化模型文件(您可以在模型训练阶段创建并应用-d选项)和ARFF格式的测试文件.
public void test(String modelFileSerialized, String testFileARFF)
throws Exception
{
// Deserialize the classifier.
Classifier classifier =
(Classifier) weka.core.SerializationHelper.read(
modelFileSerialized);
// Load the test instances.
Instances testInstances = DataSource.read(testFileARFF);
// Mark the last attribute in each instance as the true class.
testInstances.setClassIndex(testInstances.numAttributes()-1);
int numTestInstances = testInstances.numInstances();
System.out.printf("There are %d test instances\n", numTestInstances);
// Loop over each test instance.
for (int i = 0; i < numTestInstances; i++)
{
// Get the true class label from the instance's own classIndex.
String trueClassLabel =
testInstances.instance(i).toString(testInstances.classIndex());
// Make the prediction here.
double predictionIndex =
classifier.classifyInstance(testInstances.instance(i));
// Get the predicted class label from the predictionIndex.
String predictedClassLabel =
testInstances.classAttribute().value((int) predictionIndex);
// Get the prediction probability distribution.
double[] predictionDistribution =
classifier.distributionForInstance(testInstances.instance(i));
// Print out the true label, predicted label, and the distribution.
System.out.printf("%5d: true=%-10s, predicted=%-10s, distribution=",
i, trueClassLabel, predictedClassLabel);
// Loop over all the prediction labels in the distribution.
for (int predictionDistributionIndex = 0;
predictionDistributionIndex < predictionDistribution.length;
predictionDistributionIndex++)
{
// Get this distribution index's class label.
String predictionDistributionIndexAsClassLabel =
testInstances.classAttribute().value(
predictionDistributionIndex);
// Get the probability.
double predictionProbability =
predictionDistribution[predictionDistributionIndex];
System.out.printf("[%10s : %6.3f]",
predictionDistributionIndexAsClassLabel,
predictionProbability );
}
o.printf("\n");
}
}
Run Code Online (Sandbox Code Playgroud)
我不知道你是否可以本地做到这一点,但你可以获取每个类别的概率,对它们进行排序并取前三个。
您想要的函数是distributionForInstance(Instance instance)
返回double[]
每个类别的概率。