在一对一SVM中使用10倍交叉验证(使用LibSVM)

Question

在一对一SVM中使用10倍交叉验证(使用LibSVM)

Zah*_*ati 10 matlab classification machine-learning svm libsvm

我想在MATLAB中的一对一 支持向量机分类中进行10倍交叉验证.

我试图以某种方式混合这两个相关的答案:

但是因为我是MATLAB及其语法的新手,所以到目前为止我还没有成功.

另一方面,我在LibSVM README文件中看到了以下几行关于交叉验证的内容,我在那里找不到任何相关示例:

选项-v随机将数据分成n个部分,并计算它们的交叉验证准确度/均方误差.

有关输出的含义,请参阅libsvm FAQ.

有人能给我一个10倍交叉验证和一对一分类的例子吗？

Answer 1

Amr*_*mro 15

主要有两个原因我们进行交叉验证:

作为一种测试方法,它使我们对模型的泛化能力几乎无偏估计(通过避免过度拟合)
作为模型选择的一种方式(例如:找到训练数据的最佳C和gamma参数,参见这篇文章的例子)

对于我们感兴趣的第一个案例,该过程涉及k每个折叠的训练模型,然后在整个训练集上训练一个最终模型.我们报告k倍的平均准确度.

现在,由于我们使用one-all-all方法来处理多类问题,因此每个模型都包含N支持向量机(每个类一个).

以下是实现one-all-all方法的包装函数:

function mdl = libsvmtrain_ova(y, X, opts)
    if nargin < 3, opts = ''; end

    %# classes
    labels = unique(y);
    numLabels = numel(labels);

    %# train one-against-all models
    models = cell(numLabels,1);
    for k=1:numLabels
        models{k} = libsvmtrain(double(y==labels(k)), X, strcat(opts,' -b 1 -q'));
    end
    mdl = struct('models',{models}, 'labels',labels);
end

function [pred,acc,prob] = libsvmpredict_ova(y, X, mdl)
    %# classes
    labels = mdl.labels;
    numLabels = numel(labels);

    %# get probability estimates of test instances using each 1-vs-all model
    prob = zeros(size(X,1), numLabels);
    for k=1:numLabels
        [~,~,p] = libsvmpredict(double(y==labels(k)), X, mdl.models{k}, '-b 1 -q');
        prob(:,k) = p(:, mdl.models{k}.Label==1);
    end

    %# predict the class with the highest probability
    [~,pred] = max(prob, [], 2);
    %# compute classification accuracy
    acc = mean(pred == y);
end

Run Code Online (Sandbox Code Playgroud)

以下是支持交叉验证的功能:

function acc = libsvmcrossval_ova(y, X, opts, nfold, indices)
    if nargin < 3, opts = ''; end
    if nargin < 4, nfold = 10; end
    if nargin < 5, indices = crossvalidation(y, nfold); end

    %# N-fold cross-validation testing
    acc = zeros(nfold,1);
    for i=1:nfold
        testIdx = (indices == i); trainIdx = ~testIdx;
        mdl = libsvmtrain_ova(y(trainIdx), X(trainIdx,:), opts);
        [~,acc(i)] = libsvmpredict_ova(y(testIdx), X(testIdx,:), mdl);
    end
    acc = mean(acc);    %# average accuracy
end

function indices = crossvalidation(y, nfold)
    %# stratified n-fold cros-validation
    %#indices = crossvalind('Kfold', y, nfold);  %# Bioinformatics toolbox
    cv = cvpartition(y, 'kfold',nfold);          %# Statistics toolbox
    indices = zeros(size(y));
    for i=1:nfold
        indices(cv.test(i)) = i;
    end
end

Run Code Online (Sandbox Code Playgroud)

最后,这里有一个简单的演示来说明用法:

%# laod dataset
S = load('fisheriris');
data = zscore(S.meas);
labels = grp2idx(S.species);

%# cross-validate using one-vs-all approach
opts = '-s 0 -t 2 -c 1 -g 0.25';    %# libsvm training options
nfold = 10;
acc = libsvmcrossval_ova(labels, data, opts, nfold);
fprintf('Cross Validation Accuracy = %.4f%%\n', 100*mean(acc));

%# compute final model over the entire dataset
mdl = libsvmtrain_ova(labels, data, opts);

Run Code Online (Sandbox Code Playgroud)

将其与libsvm默认使用的一对一方法进行比较:

acc = libsvmtrain(labels, data, sprintf('%s -v %d -q',opts,nfold));
model = libsvmtrain(labels, data, strcat(opts,' -q'));

Run Code Online (Sandbox Code Playgroud)

请注意,我已将libsvm函数重命名为`libsvmtrain`和`libsvmpredict`,以避免与Bioinformatics工具箱中具有相同名称功能的函数名称冲突(即[svmtrain](http://www.mathworks.com/help/bioinfo/) REF/svmtrain.html)) (3认同)

归档时间：	13 年，1 月前
查看次数：	23619 次
最近记录：	12 年，10 月前