标签: multiclass-classification

选择哪种分类？

我有大量的yelp数据,我必须将评论分为8个不同的类别.
分类

Cleanliness
Customer Service
Parking
Billing
Food Pricing
Food Quality
Waiting time
Unspecified

Run Code Online (Sandbox Code Playgroud)

评论包含多个类别,因此我使用了多重分类.但我很困惑我如何处理积极/消极.实例审查可能对食品质量有利,但对客户服务有负面影响.前 - food taste was very good but staff behaviour was very bad. so review contains positive food quality but negative Customer service我该如何处理这个案子？我应该在分类前进行情绪分析吗？请帮我

classification machine-learning sentiment-analysis multilabel-classification multiclass-classification

luc*_*ucy

lucky-day

5
推荐指数

1
解决办法

213
查看次数

如何计算多类总体准确度、灵敏度和特异性？

谁能解释一下如何计算多类数据集的准确性、敏感性和特异性？

machine-learning confusion-matrix multiclass-classification

Rst*_*nbl

2020 12-25

5
推荐指数

1
解决办法

4849
查看次数

Model（）为参数'nr_class'获取了多个值-SpaCy多分类模型（BERT集成）

嗨，我正在用新的SpaCy Model实现一个多分类模型（5个类）en_pytt_bertbaseuncased_lg。新管道的代码在这里：

nlp = spacy.load('en_pytt_bertbaseuncased_lg')
textcat = nlp.create_pipe(
    'pytt_textcat',
    config={
        "nr_class":5,
        "exclusive_classes": True,
    }
)
nlp.add_pipe(textcat, last = True)

textcat.add_label("class1")
textcat.add_label("class2")
textcat.add_label("class3")
textcat.add_label("class4")
textcat.add_label("class5")

Run Code Online (Sandbox Code Playgroud)

培训的代码如下，并基于此处的示例（https://pypi.org/project/spacy-pytorch-transformers/）：

def extract_cat(x):
    for key in x.keys():
        if x[key]:
            return key

# get names of other pipes to disable them during training
n_iter = 250 # number of epochs

train_data = list(zip(train_texts, [{"cats": cats} for cats in train_cats]))


dev_cats_single   = [extract_cat(x) for x in dev_cats]
train_cats_single = [extract_cat(x) for x in train_cats] …

Run Code Online (Sandbox Code Playgroud)

python spacy multiclass-classification pytorch

Hen*_*ski

2019 08-16

5
推荐指数

1
解决办法

131
查看次数

SpaCy-ValueError：操作数不能与形状（1,2）（1,5）一起广播

与上一篇关于stackoverflow的帖子有关， Model（）为参数'nr_class'获取了多个值-SpaCy多分类模型（BERT集成），其中我的问题部分已经解决，我想分享实现解决方案后出现的问题。

如果我删除nr_class参数，则会在此出现此错误：

ValueError: operands could not be broadcast together with shapes (1,2) (1,5)

Run Code Online (Sandbox Code Playgroud)

我实际上以为会发生这种情况，因为我没有指定nr_class参数。它是否正确？

再一次，我的多类模型代码：

nlp = spacy.load('en_pytt_bertbaseuncased_lg')
textcat = nlp.create_pipe(
    'pytt_textcat',
    config={
        "nr_class":5,
        "exclusive_classes": True,
    }
)
nlp.add_pipe(textcat, last = True)

textcat.add_label("class1")
textcat.add_label("class2")
textcat.add_label("class3")
textcat.add_label("class4")
textcat.add_label("class5")

Run Code Online (Sandbox Code Playgroud)

培训的代码如下，并基于此处的示例（https://pypi.org/project/spacy-pytorch-transformers/）：

def extract_cat(x):
    for key in x.keys():
        if x[key]:
            return key

# get names of other pipes to disable them during training
n_iter = 250 # number of epochs

train_data = list(zip(train_texts, [{"cats": cats} for …

Run Code Online (Sandbox Code Playgroud)

python spacy multiclass-classification pytorch

Hen*_*ski

lucky-day

5
推荐指数

1
解决办法

98
查看次数

多类分类中每个类的特征重要性

我构建了一个决策树，它也为我的分类提供了特征重要性。但是我怎么能告诉我的程序给我每个类的特征重要性呢？为了给我整体功能的重要性，我有这个代码：

importances = tree.feature_importances_
#std = np.std([tree.feature_importances_ for tree in forest.estimators_],
            # axis=0)
indices = np.argsort(importances)[::-1]

# Print the feature ranking
print("Feature ranking:")

for f in range(X.shape[1]):
    print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]]))

# Plot the feature importances of the forest
plt.figure()
plt.title("Feature importances")
plt.bar(range(X.shape[1]), importances[indices],
       color="r", yerr=std[indices], align="center")
plt.xticks(range(X.shape[1]), [feature_cols[i] for i in indices])
plt.xlim([-1, X.shape[1]])
plt.show()

Run Code Online (Sandbox Code Playgroud)

我有四个类 - 0、1、2、3。有人知道解决方案吗？

python decision-tree multiclass-classification

may*_*our

lucky-day

5
推荐指数

0
解决办法

1330
查看次数

Python/ML：使用哪些方法进行产品分类的多类分类？

在泡菜...

我有一个包含 >100,000 个观察值的数据集；数据集的列包括CustomerID、VendorID、ProductID和CatNMap。这是它的样子：

如您所见，前 3 列（CustomerID、VendorID、ProductID）中表示的值表示唯一的数字映射值，如果在 x,y 平面上表示将毫无意义（这消除了许多分类方法的使用）；最后一列包含由客户分配的类别的字符串。现在，这是我不明白并且不确定如何处理的部分......

目标：是为客户预测未来的CatNMap值，但是在我看来，我在这里拥有的功能没有用，是真的吗？现在，如果是，我可以使用什么方法作为CatNMap列具有 >7,000 个唯一值；此外，如果假设对于同一产品，不同客户分配了 2 个或更多不同类别，那么任何方法将如何处理对未来项目的分类？我需要为此实现 NN 吗？

感谢所有的答案！

python classification machine-learning neural-network multiclass-classification

DGo*_*nov

lucky-day

5
推荐指数

1
解决办法

169
查看次数

xgboost 多类工作中 base_score 的用途是什么？

我正在尝试探索 Xgboost 二进制分类以及多类的工作。在二进制类的情况下，我观察到base_score被视为起始概率，并且在计算Gain和Cover时也显示出重大影响。

在多类的情况下，我无法弄清楚base_score参数的重要性，因为它向我显示了不同（任何）base_score值的Gain和Cover的相同值。

我也无法找出为什么在计算多类的覆盖率时存在因子 2，即2*p*(1-p)

有人可以帮我解决这两部分吗？

statistics machine-learning boosting xgboost multiclass-classification

jay*_*hor

lucky-day

5
推荐指数

1
解决办法

1237
查看次数

断言失败：预测必须 >= 0，条件 x >= y 不支持元素

我正在运行 2000 个时代的多类模型（总共 40 个类）。该模型运行良好，直到 828 epoch 但在 829 epoch 它给了我一个 InvalidArgumentError （见下面的截图）

下面是我用来构建模型的代码。

n_cats = 40 input_bow = tf.keras.Input(shape=(40), name="bow") hidden_1 = tf.keras.layers.Dense(200, activation="relu")(input_bow) hidden_2 = tf.keras.layers.Dense(100, activation="relu")(hidden_1) hidden_3 = tf.keras.layers.Dense(80, activation="relu")(hidden_2) hidden_4 = tf.keras.layers.Dense(70, activation="relu")(hidden_3) output = tf.keras.layers.Dense(n_cats, activation="sigmoid")(hidden_4) model = tf.keras.Model(inputs=[input_bow], outputs=output) METRICS = [ tf.keras.metrics.Accuracy(name="Accuracy"), tf.keras.metrics.Precision(name="precision"), tf.keras.metrics.Recall(name="recall"), tf.keras.metrics.AUC(name="auc"), tf.keras.metrics.BinaryAccuracy(name="binaryAcc") ] checkpoint_cb = tf.keras.callbacks.ModelCheckpoint( "my_keras_model.h5", save_best_only=True) lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(initial_learning_rate=1e-2, decay_steps=10000, decay_rate=0.9) adam_optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule) model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=METRICS) training_history = model.fit( (bow_train), indus_cat_train, epochs=2000, batch_size=128, callbacks=[checkpoint_cb], validation_data=(bow_test, indus_cat_test)) …
Run Code Online (Sandbox Code Playgroud)

python-3.x multiclass-classification tensorflow2.0

use*_*244

2020 07-30

5
推荐指数

1
解决办法

4521
查看次数

如何为 SVM One-Versus-All 绘制超平面？

当 SVM-OVA 执行如下时，我试图绘制超平面：

import matplotlib.pyplot as plt import numpy as np from sklearn.svm import SVC x = np.array([[1,1.1],[1,2],[2,1]]) y = np.array([0,100,250]) classifier = OneVsRestClassifier(SVC(kernel='linear'))
Run Code Online (Sandbox Code Playgroud)
基于这个问题的答案Plot hyperplane Linear SVM python，我编写了以下代码：

fig, ax = plt.subplots() # create a mesh to plot in x_min, x_max = x[:, 0].min() - 1, x[:, 0].max() + 1 y_min, y_max = x[:, 1].min() - 1, x[:, 1].max() + 1 xx2, yy2 = np.meshgrid(np.arange(x_min, x_max, .2),np.arange(y_min, y_max, .2)) Z = classifier.predict(np.c_[xx2.ravel(), yy2.ravel()]) Z = Z.reshape(xx2.shape) ax.contourf(xx2, …
Run Code Online (Sandbox Code Playgroud)

python machine-learning svm scikit-learn multiclass-classification

Ale*_*dro

2020 11-26

5
推荐指数

1
解决办法

174
查看次数

在python中实现SVM One-vs-all时出了点问题

我试图通过将函数OneVsRestClassifier与我自己的实现进行比较来验证我是否正确理解了 SVM - OVA（一对一）的工作原理。

在下面的代码中，我num_classes在训练阶段实现了分类器，然后在测试集上测试了所有分类器，并选择了返回最高概率值的分类器。

import pandas as pd import numpy as np from sklearn.svm import SVC from sklearn.metrics import accuracy_score,classification_report from sklearn.preprocessing import scale # Read dataset df = pd.read_csv('In/winequality-white.csv', delimiter=';') X = df.loc[:, df.columns != 'quality'] Y = df.loc[:, df.columns == 'quality'] my_classes = np.unique(Y) num_classes = len(my_classes) # Train-test split np.random.seed(42) msk = np.random.rand(len(df)) <= 0.8 train = df[msk] test = df[~msk] # From dataset to features and labels X_train = train.loc[:, train.columns …
Run Code Online (Sandbox Code Playgroud)

python svm scikit-learn multiclass-classification

Ale*_*dro

lucky-day

5
推荐指数

2
解决办法

192
查看次数

标签统计

multiclass-classification ×10

python ×6

machine-learning ×5

classification ×2

pytorch ×2

scikit-learn ×2

spacy ×2

svm ×2

boosting ×1

confusion-matrix ×1

decision-tree ×1

multilabel-classification ×1

neural-network ×1

python-3.x ×1

sentiment-analysis ×1

statistics ×1

tensorflow2.0 ×1

xgboost ×1

标签 统计

标签统计