Soh*_*aib 6 algorithm machine-learning data-mining feature-selection
我有一组使用图像处理提取的240个特征.目标是在训练后将测试用例分为7个不同的类.对于每个类,有大约60个观察值(即,每个类有大约60个特征向量,每个向量具有240个组件).
许多研究论文和书籍利用顺序前向搜索或顺序后向搜索从特征向量中选择最佳特征.下图给出了顺序前向搜索算法.
任何这样的算法都使用一些标准来区分特征.一种常见的方法是使用Bhattacharyya距离作为标准.Bhattacharyya距离是分布之间的分歧类型度量.在一些研究和研究中,我发现给定A类的矩阵M1,该类包含该类的所有60个特征向量,使得它具有n = 60行和m = 240列(因为总共有240个特征)并且类BI的类似矩阵M2可以找出它们之间的Bhattacharyya距离并找到它们的相互依赖性.
我的问题是如何整合这两者.如何将Bhattacharyya距离作为选择算法中最佳特征的标准,如上所述.
在 Arthur B. 的帮助下,我终于理解了这个概念。这是我的实现。虽然我使用了 Plus l Takeaway r 算法(顺序向前向后搜索),但我会发布这一点,因为一旦删除向后搜索,它基本上是相同的。下面的实现是在matlab中,但很容易理解:
S=zeros(Size,1); %Initial the binary array feature list with all zeros implying no feature selected
k=0;
while k<n %Begin SFS. n is the number of features that need to be extracted
t=k+l; %l is the number of features to be added in each iteration
while k<t
R=zeros(Size,1); %Size is the total number of features
for i=1:Size
if S(i)==0 %If the feature has not been selected. S is a binary array which puts a one against each feature that is selected
S_copy=S;
S_copy(i)=1;
R=OperateBhattacharrya(Matrices,S_copy,i,e,R); %The result of each iteration is stored in R
end
end
k=k+1; %increment k
[~,N]=max(R); %take the index of the maximum element in R as the best feature to be selected
S(N)=1; % put the index of selected feature as 1
end
t=k-r; %r is the number of features to be removed after selecting l features. l>r
while k>t %start Sequential Backward Search
R=zeros(Size,1);
for i=1:Size
if S(i)==1
S_copy=S;
S_copy(i)=0;
R=OperateBhattacharrya(Matrices,S_copy,i,1,R);
end
end
k=k-1;
[~,N]=max(R);
S(N)=0;
end
fprintf('Iteration :%d--%d\n',k,t);
end
Run Code Online (Sandbox Code Playgroud)
我希望这对有类似问题的人有所帮助。