标签: classification

R用箱子替换值

我有一个带整数值的df.出于分类的目的,我想将这个df替换为具有预定间隔而不是整数的更简单的df.我该如何有效地做到这一点？一个例子如下:

DF:

   1   2   3
1  5   3   0 
2  1   10  12
3  3   0   10

Run Code Online (Sandbox Code Playgroud)

转换为:

   1      2      3
1  [3-5]  [3-5]  [0-2]
2  [0-2]  [10-12][10-12]
3  [3-5]  [0-2]  [10-12]

Run Code Online (Sandbox Code Playgroud)

r classification

use*_*419

2018 06-16

1
推荐指数

1
解决办法

631
查看次数

如何使用视频数据集训练分类器

如果我有一个特定动作的视频数据集,我怎么能用它来训练一个分类器,以后可以用来对这个动作进行分类.

classification machine-learning

Ahm*_*ato

lucky-day

1
推荐指数

1
解决办法

701
查看次数

替代python中的支持向量机分类器？

我必须在155个图像特征向量之间进行比较.每个特征向量都有5个功能.我的形象分为10个班级.不幸的是,我需要至少100张图片才能使用支持向量机,有什么选择吗？

python opencv classification machine-learning scikit-learn

pos*_*res

lucky-day

1
推荐指数

1
解决办法

751
查看次数

在java代码中使用mahout,而不是cli

我希望能够使用java构建模型,我可以使用CLI进行以下操作:

    ./mahout trainlogistic --input Candy-Crush.twtr.csv \
       --output ./model \
       --target hd_click --categories 2 \
       --predictors click_frequency country_code ctr      device_price_range hd_conversion  time_of_day num_clicks phone_type twitter is_weekend app_entertainment app_wallpaper app_widgets arcade books_and_reference brain business cards casual comics communication education entertainment finance game_wallpaper game_widgets health_and_fitness health_fitness libraries_and_demo libraries_demo lifestyle media_and_video media_video medical music_and_audio news_and_magazines news_magazines personalization photography productivity racing shopping social sports sports_apps sports_games tools transportation travel_and_local weather app_entertainment_percentage app_wallpaper_percentage app_widgets_percentage arcade_percentage books_and_reference_percentage brain_percentage business_percentage cards_percentage casual_percentage comics_percentage communication_percentage education_percentage entertainment_percentage finance_percentage game_wallpaper_percentage …

Run Code Online (Sandbox Code Playgroud)

java classification mahout

Dim*_*ima

2013 09-05

1
推荐指数

1
解决办法

2848
查看次数

用于性别分类的SVM:使用线性内核100%正确的结果,但使用RBF的结果更差

我根据脸部图像制作了一个用于性别分类的小程序.我使用耶鲁面部数据库(男性为175张图像,女性为相同数字),将它们转换为灰度和均衡直方图,因此在预处理后图像看起来像这样:

在此输入图像描述

我运行以下代码来测试结果(它使用SVM和线性内核):

def run_gender_classifier():
    Xm, Ym = mkdataset('gender/male', 1)     # mkdataset just preprocesses images, 
    Xf, Yf = mkdataset('gender/female', 0)   #  flattens them and stacks into a matrix
    X = np.vstack([Xm, Xf])
    Y = np.hstack([Ym, Yf])
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y,
                                                    test_size=0.1,
                                                    random_state=100)
    model = svm.SVC(kernel='linear')
    model.fit(X_train, Y_train)
    print("Results:\n%s\n" % (
        metrics.classification_report(
            Y_test, model.predict(X_test))))

Run Code Online (Sandbox Code Playgroud)

并获得100%的精度!

In [22]: run_gender_classifier()
Results:
             precision    recall  f1-score   support

          0       1.00      1.00      1.00        16
          1 …

Run Code Online (Sandbox Code Playgroud)

classification image-processing svm scikit-learn

ffr*_*end

lucky-day

1
推荐指数

1
解决办法

2319
查看次数

Stanford-NER定制,用于对软件编程关键字进行分类

我是NLP的新手,我使用Stanford NER工具对一些随机文本进行分类,以提取软件编程中使用的特殊关键字.

问题是,我不知道如何对Stanford NER中的分类器和文本注释器进行更改以识别软件编程关键字.例如:

today Java used in different operating systems (Windows, Linux, ..)

Run Code Online (Sandbox Code Playgroud)

分类结果应如下:

Java "Programming_Language"
Windows "Operating_System"
Linux "Operating_system"

Run Code Online (Sandbox Code Playgroud)

请问如何定制StanfordNER分类器以满足我的需求？

java nlp classification stanford-nlp

Tec*_*ech

2014 05-23

1
推荐指数

1
解决办法

1358
查看次数

python:如何从feature_importances获取真正的功能名称

我使用Python的sklearn随机林(ensemble.RandomForestClassifier)进行分类,并feature_importances_用于查找分类器的重要功能.现在我的代码是:

for trip in database:
    venue_feature_start.append(Counter(trip['POI']))
# Counter(trip['POI']) is like Counter({'school':1, 'hospital':1, 'bus station':2}),actually key is the feature

feat_loc_vectorizer = DictVectorizer()
feat_loc_vectorizer.fit(venue_feature_start)
feat_loc_orig_mat = feat_loc_vectorizer.transform(venue_feature_start)

orig_tfidf = TfidfTransformer()
orig_ven_feat = orig_tfidf.fit_transform(feat_loc_orig_mat.tocsr())

# so DictVectorizer() and TfidfTransformer() help me to phrase the features and for each instance, the feature dimension is 580, which means that there are 580 venue types 

data = orig_ven_feat.tocsr()

le = LabelEncoder() 
labels = le.fit_transform(labels_raw)
if "Unlabelled" in labels_raw:
    unlabelled_int = …

Run Code Online (Sandbox Code Playgroud)

python classification feature-selection scikit-learn

gla*_*313

2015 05-21

1
推荐指数

2
解决办法

5247
查看次数

cross_val_predict之后对新文档进行分类

我有大约10,000条推文的样本，我希望将其分类为“相关”和“不相关”。我正在为此模型使用Python的scikit-learn。我手动将1,000条推文编码为“相关”或“不相关”。然后，我使用80％的手动编码数据作为训练数据，其余的作为测试数据运行了SVM模型。我获得了良好的结果（预测准确度〜0.90），但是为了避免过度拟合，我决定对所有1,000条手动编码的推文使用交叉验证。

下面是我的代码，已经为示例中的推文获取了tf-idf矩阵。“目标”是一个数组，列出了该推文是否标记为“相关”或“不相关”。

from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_val_predict

clf = SGDClassifier()
scores = cross_val_score(clf, X_tfidf, target, cv=10)
predicted = cross_val_predict(clf, X_tfidf, target, cv=10)

Run Code Online (Sandbox Code Playgroud)

通过此代码，我可以预测1,000条推文所属的类，并将其与我的手动编码进行比较。

为了继续使用模型对我没有手动编码的其他约9000条推文进行分类，我坚持下一步要做的事情。我当时在考虑cross_val_predict再次使用，但是由于类是我要预测的内容，因此我不确定在第三个参数中输入什么。

预先感谢您的所有帮助！

python twitter classification machine-learning scikit-learn

Eun*_*ice

lucky-day

1
推荐指数

1
解决办法

524
查看次数

如何为CNN的每个班级获得0到1的分数？

我目前正在训练一个网络(使用Tensorflow实现的CNN)来分类超过3个类,事情是我最终得分如下:

[ -20145.36, 150069, 578456.3 ].

Run Code Online (Sandbox Code Playgroud)

我希望得分在0到1之间(某种概率).

起初,我想过使用sigmoid函数,但后来我发现这个讨论甚至没有提到:

https://www.quora.com/How-do-you-normalize-numeric-scores-to-a-0-1-range-for-comparing-different-machine-learning-techniques

你建议我做什么,每堂课的得分在0到1之间？

谢谢

classification deep-learning conv-neural-network tensorflow softmax

A. *_*iro

2017 04-21

1
推荐指数

1
解决办法

1245
查看次数

使用to_categorical转换np.array时出现内存问题

我有一个像这样的numpy数组：

[[0. 1. 1. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 1. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 1. 0. 1.]]

Run Code Online (Sandbox Code Playgroud)

我这样转换它以减少内存需求：

x_val = x_val.astype(np.int)

Run Code Online (Sandbox Code Playgroud)

结果是：

[[0 1 1 ... 0 0 1]
 [0 0 0 ... 0 0 1]
 [0 0 1 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 1]
 [0 0 …

Run Code Online (Sandbox Code Playgroud)

python numpy classification machine-learning keras

cs0*_*815

2018 08-17

1
推荐指数

1
解决办法

480
查看次数

标签统计

classification ×10

machine-learning ×4

python ×4

scikit-learn ×4

java ×2

conv-neural-network ×1

deep-learning ×1

feature-selection ×1

image-processing ×1

keras ×1

mahout ×1

nlp ×1

numpy ×1

opencv ×1

r ×1

softmax ×1

stanford-nlp ×1

svm ×1

tensorflow ×1

twitter ×1

标签 统计

标签统计