标签: classification

理解逻辑回归的概率解释

我在培养关于逻辑回归的概率解释的直觉方面遇到问题。具体来说，为什么将逻辑回归函数的输出视为概率是有效的？

classification machine-learning

Vai*_*hta

2020 11-09

2
推荐指数

1
解决办法

2400
查看次数

图像检索和图像分类的区别

load fisheriris
xdata = meas(51:end,3:4);
group = species(51:end);
svmStruct = svmtrain(xdata,group,'showplot',true);



species = svmclassify(svmStruct,[5 2],'showplot',true)
hold on;plot(5,2,'ro','MarkerSize',12);hold off

Run Code Online (Sandbox Code Playgroud)

上面的代码给出了结果--->species = 'virginica'

该物种被归类为“弗吉尼亚”。这只是一张图片。这个分类过程可以称为“图像检索”吗？？？

或者我们是否必须检索许多图像才能将其称为图像检索？

matlab classification image-processing svm

Mrk*_*Mrk

2013 02-28

2
推荐指数

1
解决办法

2552
查看次数

如何使用 NLTK BigramAssocMeasures.ch_sq

我有单词列表，我想通过考虑它们的共同出现来计算两个单词的相关性。从一篇论文中我发现它可以使用Pearsson 卡方检验来计算。我还发现nltk.BigramAssocMeasures.ch_sq()用于计算卡方值。

我可以用它来满足我的需求吗？如何使用 nltk 找到卡方值？

python nlp classification nltk

Roh*_*ith

2017 12-27

2
推荐指数

1
解决办法

2990
查看次数

具有仅正和未标记数据集的二元半监督分类

我的数据由评论（保存在文件中）组成，其中很少被标记为正面。我想使用半监督和PU分类将这些评论分为正面和负面类别。我想知道python（scikit-learn）中是否有半监督和PU实现的公共实现？

classification scikit-learn

imk*_*han

lucky-day

2
推荐指数

1
解决办法

3257
查看次数

f1_score 中的 pos_label 到底是什么意思？

我正在 sklearn 中尝试 k_fold 交叉验证，并且对 f1_score 中的 pos_label 参数感到困惑。我知道 pos_label 参数与如果类别不是二进制的情况下如何处理数据有关。但我对它的重要性并没有很好的概念性理解 - 有没有人对它在概念层面上的含义有很好的解释？

我已经阅读了文档，但它们并没有真正的帮助。

classification machine-learning scikit-learn

dat*_*Sci

lucky-day

2
推荐指数

1
解决办法

3234
查看次数

什么是图像处理中的训练和测试？

我正在一些 RGB 图像上实现基于k 均值聚类方法的颜色量化。然后，我将确定算法的性能。我找到了一些关于训练和测试的信息。据我了解，我应该划分图像样本进行训练和测试。

但我对培训和测试这两个术语感到困惑。这些是什么意思？以及如何使用排名值来实现？

evaluation classification image-processing k-means training-data

Uyg*_*gar

2016 01-03

2
推荐指数

1
解决办法

5888
查看次数

从 rpart 包中的决策规则中提取信息

我需要从决策树中的规则中提取信息。我在 R 中使用 rpart 包。我在包中使用演示数据来解释我的要求：

data(stagec)
fit<- rpart(formula = pgstat ~ age + eet + g2 + grade + gleason + ploidy, data = stagec, method = "class", control=rpart.control(cp=0.05))
fit

Run Code Online (Sandbox Code Playgroud)

印花合身展示

n= 146 

node), split, n, loss, yval, (yprob)
      * denotes terminal node

 1) root 146 54 0 (0.6301370 0.3698630)  
   2) grade< 2.5 61  9 0 (0.8524590 0.1475410) *
   3) grade>=2.5 85 40 1 (0.4705882 0.5294118)  
     6) g2< 13.2 40 17 0 (0.5750000 0.4250000)  
      12) ploidy=diploid,tetraploid 31 11 0 (0.6451613 …

Run Code Online (Sandbox Code Playgroud)

r classification machine-learning decision-tree

avi*_*avi

lucky-day

2
推荐指数

1
解决办法

8904
查看次数

Python 文本分类错误 - 预期的字符串或类似字节的对象

我正在尝试在 python 中对大型语料库（732,066 条推文）进行文本分类

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
#dataset = pd.read_csv('Restaurant_Reviews.tsv', delimiter = '\t', quoting = 3)

# Importing the dataset
cols = ["text","geocoordinates0","geocoordinates1","grid"]
dataset = pd.read_csv('tweets.tsv', delimiter = '\t', usecols=cols, quoting = 3, error_bad_lines=False, low_memory=False)

# Removing Non-ASCII characters
def remove_non_ascii_1(dataset):
    return ''.join([i if ord(i) < 128 else ' ' for i in dataset])

# Cleaning the texts
import re
import nltk
nltk.download('stopwords') …

Run Code Online (Sandbox Code Playgroud)

python twitter text nlp classification

Seu*_*JAO

lucky-day

2
推荐指数

1
解决办法

8190
查看次数

负决策函数值

我在 Iris 数据集上使用来自 sklearn 的支持向量分类器。当我调用 decision_function它时返回负值。但是分类后测试数据集中的所有样本都具有正确的类别。我认为当样本是内点时，decision_function 应该返回正值，如果样本是异常值，则应该返回负值。我错在哪里？

from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
X = iris.data[:,:]
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, 
random_state=0)

clf = SVC(probability=True)
print(clf.fit(X_train,y_train).decision_function(X_test))
print(clf.predict(X_test))
print(y_test)

Run Code Online (Sandbox Code Playgroud)

这是输出：

[[-0.76231668 -1.03439531 -1.40331645]
 [-1.18273287 -0.64851109  1.50296097]
 [ 1.10803774  1.05572833  0.12956269]
 [-0.47070432 -1.08920859 -1.4647051 ]
 [ 1.18767563  1.12670665  0.21993744]
 [-0.48277866 -0.98796232 -1.83186272]
 [ 1.25020033  1.13721691  0.15514536]
 [-1.07351583 -0.84997114  0.82303659]
 [-1.04709616 -0.85739411  0.64601611]
 [-1.23148923 -0.69072989  1.67459938]
 [-0.77524787 …

Run Code Online (Sandbox Code Playgroud)

classification machine-learning svm scikit-learn

V. *_*Gai

2017 10-19

2
推荐指数

1
解决办法

3104
查看次数

使用 caret 包进行交叉验证的最终模型

我使用 Caret 包中的随机森林方法对我的数据进行了交叉验证，R 表示最终模型是使用 mtry=34 构建的，这是否意味着在最终随机森林（由交叉验证产生）中只有 34 个参数变量在我的数据集中用于在树中分裂？

> output
Random Forest 

 375 samples
  592 predictors
  2 classes: 'alzheimer', 'control' 

  No pre-processing
  Resampling: Cross-Validated (3 fold) 
  Summary of sample sizes: 250, 250, 250 
  Resampling results across tuning parameters:

  mtry  Accuracy   Kappa    
  2   0.6826667  0.3565541
  34   0.7600000  0.5194246
  591   0.7173333  0.4343563

   Accuracy was used to select the optimal model using  the largest value.
   The final value used for the model was mtry = 34.

Run Code Online (Sandbox Code Playgroud)

r classification random-forest cross-validation r-caret

ch.*_*ahe

lucky-day

2
推荐指数

1
解决办法

1788
查看次数