Krz*_*jst 5 matlab classification machine-learning decision-tree weka
我正在尝试使用MATLAB和WEKA API从WEKA检索类.一切看起来很好,但课程总是0.任何想法?
我的数据集有241个属性,将WEKA应用于此数据集我正在获得正确的结果.
创建第一列和测试对象,而不是生成分类器并执行classifyInstance.但这会给出错误的结果
train = [xtrain ytrain];
test = [xtest];
save ('train.txt','train','-ASCII');
save ('test.txt','test','-ASCII');
%## paths
WEKA_HOME = 'C:\Program Files\Weka-3-7';
javaaddpath([WEKA_HOME '\weka.jar']);
fName = 'train.txt';
%## read file
loader = weka.core.converters.MatlabLoader();
loader.setFile( java.io.File(fName) );
train = loader.getDataSet();
train.setClassIndex( train.numAttributes()-1 );
% setting class as nominal
v(1) = java.lang.String('-R');
v(2) = java.lang.String('242');
options = cat(1,v(1:end));
filter = weka.filters.unsupervised.attribute.NumericToNominal();
filter.setOptions(options);
filter.setInputFormat(train);
train = filter.useFilter(train, filter);
fName = 'test.txt';
%## read file
loader = weka.core.converters.MatlabLoader();
loader.setFile( java.io.File(fName) );
test = loader.getDataSet();
%## dataset
relationName = char(test.relationName);
numAttr = test.numAttributes;
numInst = test.numInstances;
%## classification
classifier = weka.classifiers.trees.J48();
classifier.buildClassifier( train );
fprintf('Classifier: %s %s\n%s', ...
char(classifier.getClass().getName()), ...
char(weka.core.Utils.joinOptions(classifier.getOptions())), ...
char(classifier.toString()) )
classes =[];
for i=1:numInst
classes(i) = classifier.classifyInstance(test.instance(i-1));
end
Run Code Online (Sandbox Code Playgroud)
这是一个新代码,但仍然无法正常工作 - classes = 0. Weka对同一算法和数据集的输出是可以的
===按类分类的详细准确度===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.99 0.015 0.985 0.99 0.988 0.991 0
0.985 0.01 0.99 0.985 0.988 0.991 1
Weighted Avg. 0.988 0.012 0.988 0.988 0.988 0.991
===混淆矩阵===
a b <-- classified as
1012 10 | a = 0
15 1003 | b = 1
ytest1 = ones(size(xtest,1),1);
train = [xtrain ytrain];
test = [xtest ytest1];
save ('train.txt','train','-ASCII');
save ('test.txt','test','-ASCII');
%## paths
WEKA_HOME = 'C:\Program Files\Weka-3-7';
javaaddpath([WEKA_HOME '\weka.jar']);
fName = 'train.txt';
%## read file
loader = weka.core.converters.MatlabLoader();
loader.setFile( java.io.File(fName) );
train = loader.getDataSet();
train.setClassIndex( train.numAttributes()-1 );
v(1) = java.lang.String('-R');
v(2) = java.lang.String('242');
options = cat(1,v(1:end));
filter = weka.filters.unsupervised.attribute.NumericToNominal();
filter.setOptions(options);
filter.setInputFormat(train);
train = filter.useFilter(train, filter);
fName = 'test.txt';
%## read file
loader = weka.core.converters.MatlabLoader();
loader.setFile( java.io.File(fName) );
test = loader.getDataSet();
filter = weka.filters.unsupervised.attribute.NumericToNominal();
filter.setOptions( weka.core.Utils.splitOptions('-R last') );
filter.setInputFormat(test);
test = filter.useFilter(test, filter);
%## dataset
relationName = char(test.relationName);
numAttr = test.numAttributes;
numInst = test.numInstances;
%## classification
classifier = weka.classifiers.trees.J48();
classifier.buildClassifier( train );
fprintf('Classifier: %s %s\n%s', ...
char(classifier.getClass().getName()), ...
char(weka.core.Utils.joinOptions(classifier.getOptions())), ...
char(classifier.toString()) )
classes = zeros(numInst,1);
for i=1:numInst
classes(i) = classifier.classifyInstance(test.instance(i-1));
end
Run Code Online (Sandbox Code Playgroud)
这是Java中类分发的代码片段
// output predictions
System.out.println("# - actual - predicted - error - distribution");
for (int i = 0; i < test.numInstances(); i++) {
double pred = cls.classifyInstance(test.instance(i));
double[] dist = cls.distributionForInstance(test.instance(i));
System.out.print((i+1));
System.out.print(" - ");
System.out.print(test.instance(i).toString(test.classIndex()));
System.out.print(" - ");
System.out.print(test.classAttribute().value((int) pred));
System.out.print(" - ");
if (pred != test.instance(i).classValue())
System.out.print("yes");
else
System.out.print("no");
System.out.print(" - ");
System.out.print(Utils.arrayToString(dist));
System.out.println();
Run Code Online (Sandbox Code Playgroud)
我把它转换成这样的MATLAB代码
classes = zeros(numInst,1);
for i=1:numInst
pred = classifier.classifyInstance(test.instance(i-1));
classes(i) = str2num(char(test.classAttribute().value(( pred))));
end
Run Code Online (Sandbox Code Playgroud)
但类输出不正确.
在你的答案中,你没有表明pred包含类和predProb概率.只需打印!!!
训练和测试数据必须具有相同数量的属性。因此,在您的情况下,即使您不知道测试数据的实际类别,也只需使用虚拟值:
ytest = ones(size(xtest,1),1); %# dummy class values for test data
train = [xtrain ytrain];
test = [xtest ytest];
save ('train.txt','train','-ASCII');
save ('test.txt','test','-ASCII');
Run Code Online (Sandbox Code Playgroud)
加载测试数据集时,不要忘记将其转换为名义属性(就像对训练数据集所做的那样):
filter = weka.filters.unsupervised.attribute.NumericToNominal();
filter.setOptions( weka.core.Utils.splitOptions('-R last') );
filter.setInputFormat(test);
test = filter.useFilter(test, filter);
Run Code Online (Sandbox Code Playgroud)
最后,您可以调用经过训练的 J48 分类器来预测测试实例的类值:
classes = zeros(numInst,1);
for i=1:numInst
classes(i) = classifier.classifyInstance(test.instance(i-1));
end
Run Code Online (Sandbox Code Playgroud)
如果不知道您正在使用的数据,就很难判断。
让我用一个完整的例子来说明。我将在 MATLAB 中根据 Fisher Iris 数据创建数据集(4 个属性、150 个实例、3 个类)。
%# load dataset (data + labels)
load fisheriris
X = meas;
Y = grp2idx(species);
%# partition the data into training/testing
c = cvpartition(Y, 'holdout',1/3);
xtrain = X(c.training,:);
ytrain = Y(c.training);
xtest = X(c.test,:);
ytest = Y(c.test); %# or dummy values
%# save as space-delimited text file
train = [xtrain ytrain];
test = [xtest ytest];
save train.txt train -ascii
save test.txt test -ascii
Run Code Online (Sandbox Code Playgroud)
我应该在这里提到,在使用过滤器之前,确保两个数据集中的每个数据集中都完全表示类值非常重要NumericToNominal。否则,训练集和测试集可能不兼容。我的意思是,每个类中的每个值都必须至少有一个实例。因此,如果您使用虚拟值,也许我们可以这样做:
ytest = ones(size(xtest,1),1);
v = unique(Y);
ytest(1:numel(v)) = v;
Run Code Online (Sandbox Code Playgroud)
接下来,让我们使用 Weka API 读取新创建的文件。我们将最后一个属性从数字转换为名义(以启用分类):
%# read train/test files using Weka
fName = 'train.txt';
loader = weka.core.converters.MatlabLoader();
loader.setFile( java.io.File(fName) );
train = loader.getDataSet();
train.setClassIndex( train.numAttributes()-1 );
fName = 'test.txt';
loader = weka.core.converters.MatlabLoader();
loader.setFile( java.io.File(fName) );
test = loader.getDataSet();
test.setClassIndex( test.numAttributes()-1 );
%# convert last attribute (class) from numeric to nominal
filter = weka.filters.unsupervised.attribute.NumericToNominal();
filter.setOptions( weka.core.Utils.splitOptions('-R last') );
filter.setInputFormat(train);
train = filter.useFilter(train, filter);
filter = weka.filters.unsupervised.attribute.NumericToNominal();
filter.setOptions( weka.core.Utils.splitOptions('-R last') );
filter.setInputFormat(test);
test = filter.useFilter(test, filter);
Run Code Online (Sandbox Code Playgroud)
现在我们训练一个 J48 分类器并用它来预测测试实例的类别:
%# train a J48 tree
classifier = weka.classifiers.trees.J48();
classifier.setOptions( weka.core.Utils.splitOptions('-c last -C 0.25 -M 2') );
classifier.buildClassifier( train );
%# classify test instances
numInst = test.numInstances();
pred = zeros(numInst,1);
predProbs = zeros(numInst, train.numClasses());
for i=1:numInst
pred(i) = classifier.classifyInstance( test.instance(i-1) );
predProbs(i,:) = classifier.distributionForInstance( test.instance(i-1) );
end
Run Code Online (Sandbox Code Playgroud)
最后,我们根据测试数据评估经过训练的模型性能(这应该与您在 Weka Explorer 中看到的类似)。显然,只有当测试实例具有真实的类值(而不是虚拟值)时,这才有意义:
eval = weka.classifiers.Evaluation(train);
eval.evaluateModel(classifier, test, javaArray('java.lang.Object',1));
fprintf('=== Run information ===\n\n')
fprintf('Scheme: %s %s\n', ...
char(classifier.getClass().getName()), ...
char(weka.core.Utils.joinOptions(classifier.getOptions())) )
fprintf('Relation: %s\n', char(train.relationName))
fprintf('Instances: %d\n', train.numInstances)
fprintf('Attributes: %d\n\n', train.numAttributes)
fprintf('=== Classifier model ===\n\n')
disp( char(classifier.toString()) )
fprintf('=== Summary ===\n')
disp( char(eval.toSummaryString()) )
disp( char(eval.toClassDetailsString()) )
disp( char(eval.toMatrixString()) )
Run Code Online (Sandbox Code Playgroud)
上述示例在 MATLAB 中的输出:
=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: train.txt-weka.filters.unsupervised.attribute.NumericToNominal-Rlast
Instances: 100
Attributes: 5
=== Classifier model ===
J48 pruned tree
------------------
att_4 <= 0.6: 1 (33.0)
att_4 > 0.6
| att_3 <= 4.8
| | att_4 <= 1.6: 2 (32.0)
| | att_4 > 1.6: 3 (3.0/1.0)
| att_3 > 4.8: 3 (32.0)
Number of Leaves : 4
Size of the tree : 7
=== Summary ===
Correctly Classified Instances 46 92 %
Incorrectly Classified Instances 4 8 %
Kappa statistic 0.8802
Mean absolute error 0.0578
Root mean squared error 0.2341
Relative absolute error 12.9975 %
Root relative squared error 49.6536 %
Coverage of cases (0.95 level) 92 %
Mean rel. region size (0.95 level) 34 %
Total Number of Instances 50
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
1 0 1 1 1 1 1
0.765 0 1 0.765 0.867 0.879 2
1 0.118 0.8 1 0.889 0.938 3
Weighted Avg. 0.92 0.038 0.936 0.92 0.919 0.939
=== Confusion Matrix ===
a b c <-- classified as
17 0 0 | a = 1
0 13 4 | b = 2
0 0 16 | c = 3
Run Code Online (Sandbox Code Playgroud)