小编Viv*_*mar的帖子

单例数组数组(<函数列在0x7f3a311320d0>,dtype = object)不能被视为有效集合

不知道如何解决.任何帮助非常感谢.我看到了矢量化:不是一个有效的集合,但不确定我是否理解这一点

    train = df1.iloc[:,[4,6]]
            target =df1.iloc[:,[0]]

            def train(classifier, X, y):
                X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33)

                classifier.fit(X_train, y_train)
                print ("Accuracy: %s" % classifier.score(X_test, y_test))
                return classifier

        trial1 = Pipeline([
            ('vectorizer', TfidfVectorizer()),
            ('classifier', MultinomialNB()),
        ])

        train(trial1, train, target)

Run Code Online (Sandbox Code Playgroud)

错误如下:

    ----> 6 train(trial1, train, target)

    <ipython-input-140-ac0e8d32795e> in train(classifier, X, y)
          1 def train(classifier, X, y):
    ----> 2     X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33)
          3 
          4     classifier.fit(X_train, y_train)
          5     print ("Accuracy: %s" % classifier.score(X_test, y_test)) …

Run Code Online (Sandbox Code Playgroud)

python pipeline pandas scikit-learn train-test-split

man*_*sha

2018 07-12

9
推荐指数

3
解决办法

2万
查看次数

将scikit-learn(sklearn)预测添加到pandas数据框中

我正在尝试将一个sklearn预测添加到pandas数据帧中,以便我可以对预测进行全面评估.相关的代码片段如下:

clf = linear_model.LinearRegression()
clf.fit(Xtrain,ytrain)
ypred = pd.DataFrame({'pred_lin_regr': pd.Series(clf.predict(Xtest))})

Run Code Online (Sandbox Code Playgroud)

数据框看起来像这样:

XTEST

       axial_MET  cos_theta_r1  deltaE_abs  lep1_eta   lep1_pT  lep2_eta  
8000   1.383026      0.332365    1.061852  0.184027  0.621598 -0.316297   
8001  -1.054412      0.046317    1.461788 -1.141486  0.488133  1.011445   
8002   0.259077      0.429920    0.769219  0.631206  0.353469  1.027781   
8003  -0.096647      0.066200    0.411222 -0.867441  0.856115 -1.357888   
8004   0.145412      0.371409    1.111035  1.374081  0.485231  0.900024

Run Code Online (Sandbox Code Playgroud)

ytest

Run Code Online (Sandbox Code Playgroud)

ypred

        pred_lin_regr
0       0.461636
1       0.314448
2       0.363751
3       0.291858
4       0.416056

Run Code Online (Sandbox Code Playgroud)

连接Xtest和ytest工作正常:

df_total = pd.concat([Xtest, ytest], …

Run Code Online (Sandbox Code Playgroud)

python numpy pandas scikit-learn

bol*_*lla

2018 11-16

8
推荐指数

1
解决办法

1万
查看次数

Sklearn 单变量选择：特征是恒定的

当尝试对 sklearn 中的某些数据使用特征选择和 f_classif（方差分析测试）时，我收到以下警告消息：

C:\Users\Alexander\Anaconda3\lib\site-packages\sklearn\feature_selection\univariate_selection.py:113：UserWarning：功能...是不变的。用户警告）

警告消息指示的特征是恒定的，显然 p 值为 0。我无法找到有关导致此警告的原因的任何信息。此特定函数的 github 文件位于：https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_selection/univariate_selection.py

任何帮助将不胜感激，谢谢。

python feature-selection scikit-learn

Ale*_*lex

2017 01-16

8
推荐指数

1
解决办法

7152
查看次数

sklearn中的预定义Split函数

我正在尝试使用我提供的拆分cross_val_score来运行。sklearn该sklearn文档给出了以下示例：

>>> from sklearn.model_selection import PredefinedSplit
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([0, 0, 1, 1])
>>> test_fold = [0, 1, -1, 1]
>>> ps = PredefinedSplit(test_fold)
>>> ps.get_n_splits()
2
>>> print(ps)       
PredefinedSplit(test_fold=array([ 0,  1, -1,  1]))
>>> for train_index, test_index in ps.split():
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [1 2 3] …

Run Code Online (Sandbox Code Playgroud)

python scikit-learn

clo*_*g14

2017 05-14

8
推荐指数

1
解决办法

6392
查看次数

Keras 中的损失、指标和得分

在构建模型时loss,metrics和之间有什么区别？它们应该不同还是相同？在典型模型中，我们将所有三个用于。scoringkerasGridSearchCV

这是使用所有三个的典型回归模型的快照。

def create_model():

 model = Sequential()
 model.add(Dense(12, input_dim=1587, activation='relu'))
 model.add(Dense(1, activation='sigmoid'))

 model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_squared_error'])
 return model

model = KerasRegressor(build_fn=create_model, verbose=0)
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model,param_grid=param_grid, scoring='r2' n_jobs=-1)
grid_result = grid.fit(X, Y)

Run Code Online (Sandbox Code Playgroud)

machine-learning scikit-learn keras grid-search tensorflow

Abd*_*han

2018 07-10

8
推荐指数

1
解决办法

4929
查看次数

我有一个遵循一键编码模式的数据集，我的因变量也是二进制的。我的代码的第一部分列出了整个数据集的重要变量。我使用了本stackoverflow帖子中提到的方法：“ 使用scikit确定每个功能对特定类预测的贡献“我不确定要获得什么输出。对于我来说，功能重要性是整个模型中最重要的功能，“与延迟相关的DMS”。我将其解释为，该变量应该很重要在Class 0或Class 1中，但是从我得到的输出中，这在两个Class中都不重要。我在上面共享的stackoverflow中的代码还显示，当DV为二进制时，Class 0的输出正好相反（按术语类1的符号+/-）。在我的情况下，两个类中的值都不同。

这是情节的样子：-

功能重要性-整体模型

功能重要性-0级

功能重要性-1级

我的代码的第二部分显示了累积功能的重要性，但是查看[plot]则表明所有变量都不重要。我的公式有误还是解释有误？

情节

这是我的代码；

import pandas as pd
import numpy as np
import json
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import scale
from sklearn.ensemble import ExtraTreesClassifier


##get_ipython().run_line_magic('matplotlib', 'inline')

file = r'RCM_Binary.csv'
data = pd.read_csv()
print("data loaded successfully ...")

# Define features and target
X = data.iloc[:,:-1]
y = data.iloc[:,-1]

#split to training and testing
X_train, X_test, y_train, y_test …

Run Code Online (Sandbox Code Playgroud)

python machine-learning binary-data random-forest scikit-learn

Jam*_*ond

2018 05-07

7
推荐指数

1
解决办法

3901
查看次数

从自上而下相机确定多边形表面旋转的方法

我有一个网络摄像头俯视一个围绕单轴旋转的表面.我希望能够测量表面的旋转角度.

相机位置和表面的旋转轴都是固定的.表面现在是一种独特的纯色,但我可以选择在表面上绘制特征,如果有帮助的话.

这是一个表面在整个范围内移动的动画,显示了不同的表观形状:

我到目前为止的方法:

记录一系列"校准"图像,其中表面在每个图像中处于已知角度
阈值每个图像以隔离表面.
使用cv2.approxPolyDP()找到四个角.我遍历各种epsilon值,直到找到一个恰好产生4个点的值.
一致地订购点(左上,右上,右下,左下)
用atan2计算每个点之间的角度.
使用角度拟合sklearn linear_model.linearRegression()

这种方法让我的预测在实际的10%左右,只有3个训练图像(覆盖全正,全负和中间位置).我对opencv和sklearn都很陌生; 有什么我应该考虑采取不同的方式来提高我的预测的准确性？(可能增加训练图像的数量是一个很大的??)

我确实直接尝试了cv2.moments作为我的模型特征,然后从瞬间得到了一些值,但这些值的表现不如角度.我也试过使用RidgeCV模型,但它似乎与线性模型的表现大致相同.

python opencv scikit-learn scikit-image

Ste*_*rne

2018 08-03

7
推荐指数

1
解决办法

205
查看次数

Hyperopt 中的 qloguniform 搜索空间设置问题

我正在使用 hyperopt 来调整我的 ML 模型，但在使用 qloguniform 作为搜索空间时遇到了麻烦。我给出了来自官方维基的例子并改变了搜索空间。

import pickle
import time
#utf8
import pandas as pd
import numpy as np
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials

def objective(x):
    return {
        'loss': x ** 2,
        'status': STATUS_OK,
        # -- store other results like this
        'eval_time': time.time(),
        'other_stuff': {'type': None, 'value': [0, 1, 2]},
        # -- attachments are handled differently
        'attachments':
            {'time_module': pickle.dumps(time.time)}
        }
trials = Trials()
best = fmin(objective,
    space=hp.qloguniform('x', np.log(0.001), np.log(0.1), np.log(0.001)),
    algo=tpe.suggest,
    max_evals=100,
    trials=trials)
pd.DataFrame(trials.trials) …

Run Code Online (Sandbox Code Playgroud)

python machine-learning hyperparameters hyperopt

Ven*_*lam

2018 12-18

7
推荐指数

1
解决办法

1940
查看次数

使用 Imblearn 管道和 GridSearchCV 进行交叉验证

我正在尝试使用Pipeline来自的类imblearn并GridSearchCV获得用于对不平衡数据集进行分类的最佳参数。由于每答案提到这里，我要离开了验证集的重采样，只有重新采样训练集，其中imblearn的Pipeline似乎是在做。但是，在实施已接受的解决方案时出现错误。请让我知道我做错了什么。下面是我的实现：

def imb_pipeline(clf, X, y, params):

    model = Pipeline([
        ('sampling', SMOTE()),
        ('classification', clf)
    ])

    score={'AUC':'roc_auc', 
           'RECALL':'recall',
           'PRECISION':'precision',
           'F1':'f1'}

    gcv = GridSearchCV(estimator=model, param_grid=params, cv=5, scoring=score, n_jobs=12, refit='F1',
                       return_train_score=True)
    gcv.fit(X, y)

    return gcv

for param, classifier in zip(params, classifiers):
    print("Working on {}...".format(classifier[0]))
    clf = imb_pipeline(classifier[1], X_scaled, y, param) 
    print("Best parameter for {} is {}".format(classifier[0], clf.best_params_))
    print("Best `F1` for {} is {}".format(classifier[0], clf.best_score_))
    print('-'*50)
    print('\n')

Run Code Online (Sandbox Code Playgroud)

参数：

[{'penalty': ('l1', 'l2'), 'C': (0.01, …

Run Code Online (Sandbox Code Playgroud)

pipeline python-3.x scikit-learn imblearn

Kri*_*lal

2019 11-12

7
推荐指数

1
解决办法

5362
查看次数

如何在pytorch中为不同的层设置不同的学习率

我正在使用resnet50与pytorch进行微调,并希望将最后一个完全连接层的学习速率设置为10 ^ -3,而其他层的学习速率设置为10 ^ -6.我知道我可以按照其文档中的方法:

optim.SGD([{'params': model.base.parameters()},
           {'params': model.classifier.parameters(), 'lr': 1e-3}], 
          lr=1e-2, momentum=0.9)

Run Code Online (Sandbox Code Playgroud)

但无论如何我不需要逐层设置参数

machine-learning python-2.7 pytorch

w.w*_*wei

2017 05-07

6
推荐指数

1
解决办法

1589
查看次数

标签统计

scikit-learn ×8

python ×7

machine-learning ×4

pandas ×2

pipeline ×2

binary-data ×1

feature-selection ×1

grid-search ×1

hyperopt ×1

hyperparameters ×1

imblearn ×1

keras ×1

numpy ×1

opencv ×1

python-2.7 ×1

python-3.x ×1

pytorch ×1

random-forest ×1

scikit-image ×1

tensorflow ×1

train-test-split ×1

小编Viv_mar的帖子

单例数组数组(<函数列在0x7f3a311320d0>,dtype = object)不能被视为有效集合

将scikit-learn(sklearn)预测添加到pandas数据框中

Sklearn 单变量选择：特征是恒定的

sklearn中的预定义Split函数

Keras 中的损失、指标和得分

使用Scikit学习确定RF模型中每个类别的功能重要性

从自上而下相机确定多边形表面旋转的方法

Hyperopt 中的 qloguniform 搜索空间设置问题

使用 Imblearn 管道和 GridSearchCV 进行交叉验证

如何在pytorch中为不同的层设置不同的学习率

标签统计

标签 统计

小编Viv_mar的帖子

标签统计