小编Viv*_*mar的帖子

ValueError:n_splits = 10不能大于每个类中的成员数

我试图运行以下代码:

from sklearn.model_selection import StratifiedKFold 
X = ["hey", "join now", "hello", "join today", "join us now", "not today", "join this trial", " hey hey", " no", "hola", "bye", "join today", "no","join join"]
y = ["n", "r", "n", "r", "r", "n", "n", "n", "n", "r", "n", "n", "n", "r"]

skf = StratifiedKFold(n_splits=10)

for train, test in skf.split(X,y):  
    print("%s %s" % (train,test))

Run Code Online (Sandbox Code Playgroud)

但是我收到以下错误:

ValueError: n_splits=10 cannot be greater than the number of members in each class.

Run Code Online (Sandbox Code Playgroud)

我在这里看了scikit-learn错误:y中人口最少的类只有1个成员,但我仍然不确定我的代码有什么问题.

我的名单都有14个长度print(len(X)) print(len(y)) …

python scikit-learn cross-validation

SFC*_*SFC

2018 08-25

3
推荐指数

1
解决办法

7484
查看次数

TypeError:'ShuffleSplit'对象不可迭代

我正在使用ShuffleSplit来重新排列数据,但我发现存在错误

TypeError                                 Traceback (most recent call last)
<ipython-input-36-192f7c286a58> in <module>()
      1 # Fit the training data to the model using grid search
----> 2 reg = fit_model(X_train, y_train)
      3 
      4 # Produce the value for 'max_depth'
      5 print "Parameter 'max_depth' is {} for the optimal model.".format(reg.get_params()['max_depth'])

<ipython-input-34-18b2799e585c> in fit_model(X, y)
     32 
     33     # Fit the grid search object to the data to compute the optimal model
---> 34     grid = grid.fit(X, y)
     35 
     36     # Return the optimal …

Run Code Online (Sandbox Code Playgroud)

python scikit-learn grid-search

Goi*_*Way

2018 08-25

2
推荐指数

1
解决办法

3240
查看次数

Python，机器学习-对自定义验证集执行网格搜索

我正在处理一个不平衡的分类问题，我的否定类比我的肯定类多1000倍。我的策略是在平衡（50/50比率）训练集（我有足够的模拟样本）上训练深度神经网络，然后使用不平衡（1/1000比率）验证集选择最佳模型并优化超参数。

由于参数数量很大，因此我想使用scikit-learn RandomizedSearchCV，即随机网格搜索。

据我了解，sk-learn GridSearch在训练集上应用了一个指标，以选择最佳的超参数集。但是，在我的情况下，这意味着GridSearch将选择对均衡训练集而不是对更现实的不均衡数据表现最佳的模型。

我的问题是：有没有一种方法可以对在特定的，用户定义的验证集上估算的性能进行网格搜索？

python validation machine-learning scikit-learn grid-search

Mil*_*ros

2017 05-04

2
推荐指数

1
解决办法

1900
查看次数

在 Logistic 回归 (Scikit-learn) 中选择合适的容差值

我在 Scikit-Learn 中使用 Logistic 回归模型（特别是LogisticRegressionCV）。当我使用默认tol值（即1e-4）并用不同的random_state值测试模型时，特征系数波动不大。至少，我可以看到哪些功能是重要的。

然而，当我设置较高的tol值（例如2.3）时，每次运行模型时，特征系数都会大幅波动。当在一次试验中特征 A 的系数为 -0.9 时，在下一次试验中它可能为 0.4。

这让我认为正确（或有利）的tol值应该是结果更加一致的值。

以下是我的代码的相关部分：

classifier = LogisticRegressionCV(penalty='l1', class_weight='balanced', 
                                #tol=2.2,
                                solver='liblinear')

Run Code Online (Sandbox Code Playgroud)

我想知道是否有指南可以确定适当的tol值。

machine-learning scikit-learn logistic-regression

ren*_*kre

2018 06-01

2
推荐指数

1
解决办法

8004
查看次数

Scikit学习多线程

您知道scikit-learn中的模型是自动使用多线程还是仅使用顺序指令？

谢谢

python multithreading python-3.x scikit-learn

Enz*_*oli

2019 04-25

2
推荐指数

1
解决办法

2495
查看次数

尚未安装RandomForestClassifier实例。使用此方法之前，请使用适当的参数调用“ fit”

我正在尝试训练决策树模型，保存它，然后在以后需要时重新加载它。但是，我不断收到以下错误：

该DecisionTreeClassifier实例尚未安装。使用此方法之前，请使用适当的参数调用“ fit”。

这是我的代码：

X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=0.20, random_state=4)

names = ["Decision Tree", "Random Forest", "Neural Net"]

classifiers = [
    DecisionTreeClassifier(),
    RandomForestClassifier(),
    MLPClassifier()
    ]

score = 0
for name, clf in zip(names, classifiers):
    if name == "Decision Tree":
        clf = DecisionTreeClassifier(random_state=0)
        grid_search = GridSearchCV(clf, param_grid=param_grid_DT)
        grid_search.fit(X_train, y_train_TF)
        if grid_search.best_score_ > score:
            score = grid_search.best_score_
            best_clf = clf
    elif name == "Random Forest":
        clf = RandomForestClassifier(random_state=0)
        grid_search = GridSearchCV(clf, param_grid_RF)
        grid_search.fit(X_train, y_train_TF)
        if grid_search.best_score_ > score:
            score …

Run Code Online (Sandbox Code Playgroud)

python machine-learning scikit-learn cross-validation grid-search

Wan*_*rer

2018 08-25

2
推荐指数

1
解决办法

5760
查看次数

sklearn 中的TSNE与 mahalanobis 公制

从TSNE sklearn与马氏公制我收到以下错误

from sklearn.manifold import TSNE      
tsne = TSNE( verbose=1, perplexity=40, n_iter=250,learning_rate=50, random_state=0,metric='mahalanobis')
pt=data.sample(frac=0.1).values
tsne_results = tsne.fit_transform(pt)

Run Code Online (Sandbox Code Playgroud)

ValueError: Must provide either V or VI for Mahalanobis distance

如何提供马氏距离的method_parameters？

python python-3.x scikit-learn

Arm*_*man

2019 04-25

2
推荐指数

1
解决办法

1546
查看次数

sklearn LogisticRegressionCV是否使用最终模型的所有数据

我想知道如何计算sklearn中LogisticRegressionCV的最终模型(即决策边界).所以说我有一些Xdata和ylabels这样的话

Xdata # shape of this is (n_samples,n_features)
ylabels # shape of this is (n_samples,), and it is binary

Run Code Online (Sandbox Code Playgroud)

现在我跑了

from sklearn.linear_model import LogisticRegressionCV
clf = LogisticRegressionCV(Cs=[1.0],cv=5)
clf.fit(Xdata,ylabels)

Run Code Online (Sandbox Code Playgroud)

这只是查看一个正则化参数和CV中的5倍.因此clf.scores_将是一个带有一个键的字典,其值为具有形状的数组(n_folds,1).通过这五个折叠,您可以更好地了解模型的执行方式.

但是,我对你得到的东西很困惑clf.coef_(我假设参数clf.coef_是用的clf.predict).我认为可能有以下几种选择:

参数clf.coef_来自在所有数据上训练模型
参数clf.coef_来自最佳得分折叠
参数in clf.coef_以某种方式在折叠中平均.

我想这是一个重复的问题,但对于我的生活,我无法在网上,sklearn文档或LogisticRegressionCV的源代码中找到简单的答案.我找到的一些相关帖子是:

python machine-learning scikit-learn cross-validation

Chr*_*uso

2018 08-25

2
推荐指数

2
解决办法

1247
查看次数

sklearn.neural_network.MLPClassifier 中属性 n_layers_ 的含义

我已经使用训练了一个模型sklearn.neural_network.MLPClassifier，我想知道我的分类器中有多少层。结果显示：

>>from sklearn.neural_network import MLPClassifier
>>clf = MLPClassifier()  
>>clf = clf.fit(train_matrix,train_label)
>>clf.n_layers_
>>3

Run Code Online (Sandbox Code Playgroud)

该文档显示属性 n_layers_ 意味着：

层数

这意味着有一个隐藏层还是有三个隐藏层？

python neural-network scikit-learn

Lee*_*ack

2019 04-25

2
推荐指数

1
解决办法

1981
查看次数

无法从版本> 0.20 的 sklearn 导入 cross_validation

当我cross_validation从sklearn以下导入时：

from sklearn import cross_validation

Run Code Online (Sandbox Code Playgroud)

我收到以下错误：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'cross_validation' from 'sklearn' (/root/anaconda3/lib/python3.7/site-packages/sklearn/__init__.py)

Run Code Online (Sandbox Code Playgroud)

python scikit-learn cross-validation

mac*_*hoe

2018 11-28

2
推荐指数

1
解决办法

2万
查看次数