标签: multilabel-classification

如何在opencv中对词袋使用SIFT特征？

我已经阅读了很多关于在获取图像的筛选特征后实现词袋的文章，但我仍然很困惑接下来要做什么。我具体做什么？

非常感谢您的指导。

这是我到目前为止的代码。

cv::Mat mat_img = cropped.clone();
Mat grayForML;
cvtColor(mat_img, grayForML, CV_BGR2GRAY);
IplImage grayImageForML = grayForML.operator IplImage();


//create another copy of iplGray
IplImage *input = cvCloneImage(&grayImageForML);
Mat matInput = cvarrToMat(input);
//  Mat matInput = copy_gray.clone();
cv::SiftFeatureDetector detector;
std::vector<cv::KeyPoint> keyPoints;
detector.detect(input, keyPoints);
//add results to image and save.
cv::Mat output;
cv::drawKeypoints(input, keyPoints, output);    //SIFT OUTPUT RESULT


//resize and display
cv::Mat output_reduced;
cv::resize(output, output_reduced, cv::Size2i(output.cols / 2, output.rows / 2));


imshow("SIFT result", output_reduced);

Run Code Online (Sandbox Code Playgroud)

opencv sift multilabel-classification

lio*_*ing

lucky-day

5
推荐指数

1
解决办法

3337
查看次数

Sklearn：使用 OneVsRestClassifier 和单独构建每个分类器的区别

据我所知，多标签问题可以通过一对多方案来解决，为此 Scikit-learn 实现OneVsRestClassifier为分类器的包装器，例如svm.SVC. 我想知道如果我从字面上训练会有什么不同，假设我们有一个多标签问题，每个标签有 n 个类别，n 个单独的二元分类器，从而分别评估它们。

我知道这就像实现一对多而不是使用包装器的“手动”方式，但两种方式实际上不同吗？如果是这样，它们有何不同，例如执行时间或分类器的性能？

scikit-learn multilabel-classification

Fra*_*cis

lucky-day

5
推荐指数

1
解决办法

1860
查看次数

使用条件随机场的多标签分类

是否可以使用条件随机场进行多标签分类？我在https://pystruct.github.io/user_guide.html看到了 python CRF 实现，但无法找到进行多标签分类的方法。

python classification machine-learning crf multilabel-classification

los*_*_19

lucky-day

5
推荐指数

1
解决办法

2478
查看次数

Sklearn-如何预测所有目标标签的概率

我有一个带有目标变量的数据集，该变量可以具有7个不同的标签。我的训练集中的每个样本都只有一个目标变量标签。

对于每个样本，我想计算每个目标标签的概率。所以我的预测将由每行7个概率组成。

在sklearn网站上，我读到了有关多标签分类的信息，但这似乎不是我想要的。

我尝试了以下代码，但是每个样本只能给我一个分类。

from sklearn.multiclass import OneVsRestClassifier
clf = OneVsRestClassifier(DecisionTreeClassifier())
clf.fit(X_train, y_train)
pred = clf.predict(X_test)

Run Code Online (Sandbox Code Playgroud)

有人对此有建议吗？谢谢！

python scikit-learn multilabel-classification

Ber*_*ans

lucky-day

5
推荐指数

1
解决办法

9786
查看次数

如何使用FeatureUnion转换PipeLine中的多个特征？

我有一个 pandas 数据框，其中包含有关用户发送的消息的信息。对于我的模型，我感兴趣的是预测消息的缺失收件人，即给定消息的收件人 A、B、C，我想预测还有谁应该成为收件人的一部分。

我正在使用 OneVsRestClassifier 和 LinearSVC 进行多标签分类。对于功能，我想使用消息的收件人。主题和主体。

由于收件人是用户列表，我想使用 MultiLabelBinarizer 转换该列。对于主题和正文，我想使用 TFIDF

我的输入 pickle 文件的数据如下：除收件人之外的所有值都是字符串，这是一个 set()

[[message_id,sent_time,subject,body,set(recipients),message_type, is_sender]]

Run Code Online (Sandbox Code Playgroud)

我在管道中使用功能联合和自定义转换器来实现此目的，如下所示。

from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.base import TransformerMixin, BaseEstimator
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.svm import SVC, LinearSVC
import pickle
import pandas as pd
import numpy as np

if __name__ == "__main__":
class ColumnSelector(BaseEstimator, TransformerMixin):
    def __init__(self, column): …

Run Code Online (Sandbox Code Playgroud)

nlp python-2.7 scikit-learn multilabel-classification data-science

use*_*612

2017 11-30

5
推荐指数

1
解决办法

1804
查看次数

多标签分类 Keras 指标

哪个指标更适合 Keras 中的多标签分类：accuracy或categorical_accuracy？显然，在这种情况下，最后一个激活函数是sigmoid和作为损失函数binary_crossentropy。

classification machine-learning multilabel-classification keras

fpi*_*fpi

2018 12-21

5
推荐指数

2
解决办法

6015
查看次数

正确的排名损失实现

我有一个多标签问题，我正在尝试在 TensorFlow 中将排名损失实现为自定义损失。( https://arxiv.org/pdf/1312.4894.pdf )

我制作了一个带有最终 Sigmoid 激活层的简单 CNN，以便为每个类提供独立的分布。
数学公式将标签分为两组，正组和负组。

$等级损失$

我的问题是，实施它的正确方法是什么？

def ranking_loss(y_true, y_pred):    
    pos = tf.where(tf.equal(y_true, 1), y_pred, tf.zeros_like(y_pred))
    neg = tf.where(tf.equal(y_true, 0), y_pred, tf.zeros_like(y_pred))

    loss = tf.maximum(1.0 - tf.math.reduce_sum(pos) + tf.math.reduce_sum(neg), 0.0)
    return tf.math.reduce_sum(loss)

Run Code Online (Sandbox Code Playgroud)

结果是对于每个样本，来自正类和负类的激活分数被独立相加。

tr = [1, 0, 0, 1]
pr = [0, 0.6, 0.55, 0.9]
t =  tf.constant([tr])
p =  tf.constant([pr])

print(ranking_loss(t, p))

tf.Tensor([[0.  0.  0.  0.9]], shape=(1, 4), dtype=float32) #Pos
tf.Tensor([[0.   0.6  0.55 0.  ]], shape=(1, 4), dtype=float32) #Neg
tf.Tensor(1.2500001, shape=(), dtype=float32) #loss

Run Code Online (Sandbox Code Playgroud)

CNN 的精确度、召回率和 …

python machine-learning multilabel-classification tensorflow loss-function

hic*_*sou

2019 08-09

5
推荐指数

1
解决办法

675
查看次数

多标签分类 ML-kNN 与 KNN

这可能是一个愚蠢的问题，但我只是想知道 scikit.ml 中实现的 ML-KNN 和 scikit-learn 的 KNeighborsClassifier 之间的区别是什么。根据sklearn 的文档， KNeighborsClassifier 支持多标签分类。然而，ML-KNN 是适用于多标签分类的 KNN，它基于 sklearn 的架构基于它的docs。

在搜索样本多标签问题时，MLkNN 主要出现，但我不明白使用它是否比 sklearn 的基本实现有任何优势，如果它已经支持它。只是sklearn方面的后期适配还是在实现上有更多差异？

任何输入表示赞赏。谢谢！

python machine-learning scikit-learn multilabel-classification scikit-multilearn

iam*_*ody

lucky-day

5
推荐指数

1
解决办法

1561
查看次数

多标签混淆矩阵

我正在对实际数据和分类器的预测数据进行多标签分类。实际数据由三个类（c1、c2 和 c3）组成，同样，预测数据也由三个类（c1、c2 和 c3）组成。数据如下

Actual_data     Predicted_data
c1 c2 c3         c1 c2 c3
1  1  0          1  1  1
1  1  0          1  0  1
1  0  1          0  1  1
0  1  1          1  0  0
1  0  0          1  1  0
1  1  1          1  0  1

Run Code Online (Sandbox Code Playgroud)

在多标签分类中，一份文档可能属于多个类别。在上面的数据中，1代表文档属于特定类别，0代表文档不属于特定类别。

Actual_data 的第一行表示文档属于类 c1 和 c2，不属于类 c3。类似地，predicted_data 的第一行表示文档属于类 c1、c2 和 c3。

最初，我使用 R 编程来查找实际数据和预测数据之间的混淆矩阵。我将这些数据框保存在 y_actual 和 y_predict 中。

y_actual<-as.matrix(Actual_data)
y_predict<-as.matrix(Predicted_data)
xtab<-table(y_actual,y_predict)

Run Code Online (Sandbox Code Playgroud)

输出xtab是

            y_predict
 y_actual     0 1
            0 1 5
            1 5 …

Run Code Online (Sandbox Code Playgroud)

python r weka confusion-matrix multilabel-classification

Ram*_*nda

2019 09-21

5
推荐指数

1
解决办法

5095
查看次数

这是sklearn分类报告对于多标签分类报告的正确使用吗？

我正在使用 tf-keras 训练神经网络。它是一个多标签分类，其中每个样本属于多个类 [1,0,1,0..etc] .. 最终模型线（只是为了清楚起见）是：

model.add(tf.keras.layers.Dense(9, activation='sigmoid'))#final layer

model.compile(loss='binary_crossentropy', optimizer=optimizer, 
                metrics=[tf.keras.metrics.BinaryAccuracy(), 
                tfa.metrics.F1Score(num_classes=9, average='macro',threshold=0.5)])

Run Code Online (Sandbox Code Playgroud)

我需要生成这些的精确度、召回率和 F1 分数（我已经得到了训练期间报告的 F1 分数）。为此，我使用 sklearns 分类报告，但我需要确认我在多标签设置中正确使用它。

from sklearn.metrics import classification_report

pred = model.predict(x_test)
pred_one_hot = np.around(pred)#this generates a one hot representation of predictions

print(classification_report(one_hot_ground_truth, pred_one_hot))

Run Code Online (Sandbox Code Playgroud)

这工作正常，我得到了每个类的完整报告，包括与张量流插件（对于宏 F1）的 F1score 指标相匹配的 F1 分数。抱歉，这篇文章很冗长，但我不确定的是：

在多标签设置的情况下，预测需要进行 one-hot 编码是否正确？如果我传递正常的预测分数（S形概率），则会抛出错误......

谢谢。

scikit-learn multilabel-classification precision-recall keras

作者

lucky-day

5
推荐指数

1
解决办法

7265
查看次数