小编Viv*_*mar的帖子

如何找出准确性？

我想知道sklearn中是否有一个功能对应于准确度(实际和预测数据之间的差异)以及如何将其打印出来？

from sklearn import datasets 
iris = datasets.load_iris()
from sklearn.naive_bayes import GaussianNB
naive_classifier= GaussianNB()
y =naive_classifier.fit(iris.data, iris.target).predict(iris.data)
pr=naive_classifier.predict(iris.data)

Run Code Online (Sandbox Code Playgroud)

python scikit-learn naivebayes

And*_*868

2018 12-29

6
推荐指数

2
解决办法

1万
查看次数

哪些系数去了scikit中多类逻辑回归中的哪一类？

我正在使用scikit learn的Logistic回归来解决多类问题.

logit = LogisticRegression(penalty='l1')
logit = logit.fit(X, y)

Run Code Online (Sandbox Code Playgroud)

我对推动这一决定的特征感兴趣.

logit.coef_

Run Code Online (Sandbox Code Playgroud)

上面给了我一个漂亮的数据帧(n_classes, n_features)格式,但所有的类和功能名称都消失了.有了功能,这没关系,因为假设它们的索引方式与我传递它们的方式相同似乎是安全的......

但是对于类,这是一个问题,因为我从未以任何顺序明确地传入类.那么哪个类做系数集(数据帧中的行)0,1,2和3属于哪个？

python scikit-learn logistic-regression

Ale*_*ail

2018 08-25

6
推荐指数

1
解决办法

2343
查看次数

如何使用sklearn的cross_val_score（）标准化数据

假设我要使用LinearSVC对数据集执行k折交叉验证。如何对数据进行标准化？

我读过的最佳实践是在培训数据上建立标准化模型，然后将此模型应用于测试数据。

当使用简单的train_test_split（）时，这很容易，因为我们可以这样做：

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y)

clf = svm.LinearSVC()

scalar = StandardScaler()
X_train = scalar.fit_transform(X_train)
X_test = scalar.transform(X_test)

clf.fit(X_train, y_train)
predicted = clf.predict(X_test)

Run Code Online (Sandbox Code Playgroud)

做k折交叉验证时如何标准化数据？问题出在每个数据点都用于训练/测试，因此您无法在cross_val_score（）之前将所有数据标准化。每个交叉验证是否都需要不同的标准化？

该文档没有提到函数内部发生的标准化。我是SOL吗？

编辑：这篇文章超级有帮助：Python-sklearn.pipeline.Pipeline到底是什么？

python standardized svm scikit-learn cross-validation

als*_*5ev

2018 04-16

6
推荐指数

1
解决办法

2501
查看次数

为什么我得到AttributeError:'KerasClassifier'对象没有属性'model'？

这是代码,我只在最后一行得到错误y_pred = classifier.predict(X_test).我得到的错误是AttributeError: 'KerasClassifier' object has no attribute 'model'

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
from sklearn import preprocessing
from keras.utils import np_utils

# Importing the dataset
dataset = pd.read_csv('Data1.csv',encoding = "cp1252")
X = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values

# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_0 = LabelEncoder()
X[:, 0] = labelencoder_X_0.fit_transform(X[:, 0])
labelencoder_X_1 = LabelEncoder()
X[:, 1] = …

Run Code Online (Sandbox Code Playgroud)

python machine-learning scikit-learn deep-learning keras

Vij*_*jay

2017 06-19

6
推荐指数

2
解决办法

9676
查看次数

Hyperopt解决的最佳参数不合适

我使用hyperopt来搜索SVM分类器的最佳参数,但Hyperopt说最好的'内核'是'0'.{'kernel':'0'}显然不合适.

有谁知道这是由我的错误还是一袋hyperopt造成的？

代码如下.

from hyperopt import fmin, tpe, hp, rand
import numpy as np
from sklearn.metrics import accuracy_score
from sklearn import svm
from sklearn.cross_validation import StratifiedKFold

parameter_space_svc = {
   'C':hp.loguniform("C", np.log(1), np.log(100)),
   'kernel':hp.choice('kernel',['rbf','poly']),
   'gamma': hp.loguniform("gamma", np.log(0.001), np.log(0.1)),    
}

from sklearn import datasets
iris = datasets.load_digits()

train_data = iris.data
train_target = iris.target

count = 0

def function(args):
  print(args)
  score_avg = 0
  skf = StratifiedKFold(train_target, n_folds=3, shuffle=True, random_state=1)
  for train_idx, test_idx in skf:
    train_X = iris.data[train_idx]
    train_y = iris.target[train_idx]
    test_X = iris.data[test_idx] …

Run Code Online (Sandbox Code Playgroud)

python machine-learning scikit-learn cross-validation hyperopt

横尾修*_*尾修平

2018 12-19

6
推荐指数

1
解决办法

1401
查看次数

xgboost.train 与 XGBClassifier

我正在使用 python 以增量方式（逐块）拟合 xgboost 模型。我遇到了一个使用 xgboost.train 的解决方案，但我不知道如何处理它返回的 Booster 对象。例如，XGBClassifier 有 fit、predict、predict_proba 等选项。

这是我正在一点一点地读取数据的 for 循环内部发生的事情：

dtrain=xgb.DMatrix(X_train, label=y)
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic'}
modelXG=xgb.train(param,dtrain,xgb_model='xgbmodel')
modelXG.save_model("xgbmodel")

Run Code Online (Sandbox Code Playgroud)

python scikit-learn xgboost

Max*_*Max

2018 05-04

6
推荐指数

1
解决办法

7445
查看次数

交叉验证时,键中的键错误不在索引中

我在我的数据集上应用了svm.我的数据集是多标签意味着每个观察都有多个标签.

虽然KFold cross-validation它引起了错误not in index.

它显示从601到6007的索引not in index(我有1 ... 6008个数据样本).

这是我的代码:

   df = pd.read_csv("finalupdatedothers.csv")
categories = ['ADR','WD','EF','INF','SSI','DI','others']
X= df[['sentences']]
y = df[['ADR','WD','EF','INF','SSI','DI','others']]
kf = KFold(n_splits=10)
kf.get_n_splits(X)
for train_index, test_index in kf.split(X,y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X.iloc[train_index], X.iloc[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]

SVC_pipeline = Pipeline([
                ('tfidf', TfidfVectorizer(stop_words=stop_words)),
                ('clf', OneVsRestClassifier(LinearSVC(), n_jobs=1)),
            ])

for category in categories:
    print('... Processing {} '.format(category))
    # train the model using X_dtm & y
    SVC_pipeline.fit(X_train['sentences'], y_train[category])

    prediction = …

Run Code Online (Sandbox Code Playgroud)

python scikit-learn cross-validation

sar*_*iii

2019 04-25

6
推荐指数

1
解决办法

4317
查看次数

参数'c'和'cmap'在matplotlib散点图中的表现如何？

对于pyplot.scatter(x,y,s,c ....)函数,

matplotlib文档指出:

c:颜色,序列或颜色序列,可选,默认值:'b'标记颜色.可能的值:

单色格式字符串.一系列长度为n的颜色规格.使用cmap和norm映射到颜色的n个数字序列.一个二维数组,其中行是RGB或RGBA.请注意,c不应该是单个数字RGB或RGBA序列,因为它与要进行颜色映射的值数组无法区分.如果要为所有点指定相同的RGB或RGBA值,请使用具有单行的二维数组.

但是我不明白我如何根据自己的意愿改变数据点的颜色.

我有这段代码:

import matplotlib.pyplot as plt
import numpy as np
import sklearn
import sklearn.datasets
import sklearn.linear_model
import matplotlib


%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (13.0, 9.0)

# Generate a dataset and plot it
np.random.seed(0)
X, y = sklearn.datasets.make_moons(200, noise=0.55)
print(y)
plt.scatter(X[:,0], X[:,1], c=y)#, cmap=plt.cm.Spectral)

Run Code Online (Sandbox Code Playgroud)

输出图

如果我愿意,我怎样才能改变颜色以设想黑色和绿色数据点？或者是其他东西？另外请解释一下cmap究竟是做什么的.

为什么每次使用plt.cm.Spectral时我的阴影都是洋红色和蓝色？

plot colors matplotlib python-3.x scikit-learn

use*_*114

2018 08-31

6
推荐指数

1
解决办法

9471
查看次数

文本数据的多标签核外学习：部分拟合的 ValueError

我正在尝试构建一个多标签的核外文本分类器。如上所述这里，这个想法是读取（大规模）分批文本数据集和部分装修他们的分类。此外，当您拥有此处所述的多标签实例时，其想法是以一对多的方式构建与数据集中类的数量一样多的二元分类器。

将 sklearn 的 MultiLabelBinarizer 和 OneVsRestClassifier 类与部分拟合相结合时，出现以下错误：

ValueError：包含多个元素的数组的真值不明确。使用 a.any() 或 a.all()

代码如下：

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import HashingVectorizer
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.multiclass import OneVsRestClassifier

categories = ['a', 'b', 'c']
X = ["This is a test", "This is another attempt", "And this is a test too!"]
Y = [['a', 'b'],['b'],['a','b']]

mlb = MultiLabelBinarizer(classes=categories)
vectorizer = HashingVectorizer(decode_error='ignore', n_features=2 ** 18,         non_negative=True)
clf = OneVsRestClassifier(MultinomialNB(alpha=0.01))

X_train = vectorizer.fit_transform(X)
Y_train = mlb.fit_transform(Y)
clf.partial_fit(X_train, Y_train, classes=categories) …

Run Code Online (Sandbox Code Playgroud)

python machine-learning scikit-learn multilabel-classification

pau*_*iuk

2018 04-17

5
推荐指数

1
解决办法

1383
查看次数

RandomForestRegressor和feature_importances_错误

我正在努力从我的RandomForestRegressor中提取功能的重要性，我得到了：

AttributeError：“ GridSearchCV”对象没有属性“ feature_importances_”。

有人知道为什么没有属性吗？根据文档，应该存在此属性？

完整代码：

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV

#Running a RandomForestRegressor GridSearchCV to tune the model.
parameter_candidates = {
    'n_estimators' : [650, 700, 750, 800],
    'min_samples_leaf' : [1, 2, 3],
    'max_depth' : [10, 11, 12],
    'min_samples_split' : [2, 3, 4, 5, 6]
}

RFR_regr = RandomForestRegressor()
CV_RFR_regr = GridSearchCV(estimator=RFR_regr, param_grid=parameter_candidates, n_jobs=5, verbose=2)
CV_RFR_regr.fit(X_train, y_train)

#Predict with testing set
y_pred = CV_RFR_regr.predict(X_test)

#Extract feature importances
importances = CV_RFR_regr.feature_importances_

Run Code Online (Sandbox Code Playgroud)

python feature-extraction random-forest scikit-learn grid-search

Sva*_*rto

2018 04-16

5
推荐指数

1
解决办法

3195
查看次数