小编far*_*mer的帖子

比较在scikit-learn中调整超参数的方法

这篇文章是关于LogisticRegressionCV,GridSearchCV和cross_val_score之间的区别。请考虑以下设置:

import numpy as np
from sklearn.datasets import load_digits
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV
from sklearn.model_selection import train_test_split, GridSearchCV, \
     StratifiedKFold, cross_val_score
from sklearn.metrics import confusion_matrix

read = load_digits()
X, y = read.data, read.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
Run Code Online (Sandbox Code Playgroud)

在惩罚逻辑回归中,我们需要设置控制正则化的参数C。scikit-learn中有3种通过交叉验证找到最佳C的方法。

Logistic回归

clf = LogisticRegressionCV (Cs = 10, penalty = "l1",
    solver = "saga", scoring = "f1_macro")
clf.fit(X_train, y_train)
confusion_matrix(y_test, clf.predict(X_test))
Run Code Online (Sandbox Code Playgroud)

旁注:文档指出,SAGA和LIBLINEAR是L1惩罚的唯一优化器,而SAGA对于大型数据集则更快。不幸的是,热启动仅适用于Newton-CG和LBFGS。

GridSearchCV

clf = LogisticRegression (penalty = "l1", solver = "saga", warm_start = True)
clf …
Run Code Online (Sandbox Code Playgroud)

python machine-learning scikit-learn cross-validation hyperparameters

9
推荐指数
1
解决办法
615
查看次数

使用 Pandas GroupBy 找到每个组的一半

我需要使用 选择数据框的一半groupby,其中每个组的大小未知并且可能因组而异。例如:

       index  summary  participant_id
0     130599     17.0              13
1     130601     18.0              13
2     130603     16.0              13
3     130605     15.0              13
4     130607     15.0              13
5     130609     16.0              13
6     130611     17.0              13
7     130613     15.0              13
8     130615     17.0              13
9     130617     17.0              13
10     86789     12.0              14
11     86791      8.0              14
12     86793     21.0              14
13     86795     19.0              14
14     86797     20.0              14
15     86799      9.0              14
16     86801     10.0              14
20    107370      1.0              15
21 …
Run Code Online (Sandbox Code Playgroud)

python pandas split-apply-combine pandas-groupby

6
推荐指数
1
解决办法
796
查看次数

虚拟功能可以用自动参数替换吗?

这个问题在stackoverflow.com/q/2391679上进行

功能的一个典型例子virtual

class Shape
{
public:
    virtual string draw() = 0;
};

class Circle : public Shape
{
public:
    string draw() { return "Round"; }
};

class Rectangle : public Shape
{
public:
    string draw() { return "Flat"; }
};

void print (Shape& obj)
{
    cout << obj.draw();
}
Run Code Online (Sandbox Code Playgroud)

但是,我们可以auto在C++ 14中传递一个参数

class Circle
{
public:
    string draw() { return "Round"; }
};

class Rectangle
{
public:
    string draw() { return "Flat"; }
};

void print …
Run Code Online (Sandbox Code Playgroud)

polymorphism templates virtual-functions auto c++14

5
推荐指数
1
解决办法
169
查看次数

Python(sklearn)-为什么我对SVR中的每个测试元组都得到相同的预测?

关于stackoverflow的类似问题的答案建议更改实例SVR()中的参数值,但我不知道如何处理它们。

这是我正在使用的代码:

import json
import numpy as np
from sklearn.svm import SVR

f = open('training_data.txt', 'r')
data = json.loads(f.read())
f.close()

f = open('predict_py.txt', 'r')
data1 = json.loads(f.read())
f.close()

features = []
response = []
predict = []

for row in data:
    a = [
        row['star_power'],
        row['view_count'],
        row['like_count'],
        row['dislike_count'],
        row['sentiment_score'],
        row['holidays'],
        row['clashes'],
    ]
    features.append(a)
    response.append(row['collection'])

for row in data1:
    a = [
        row['star_power'],
        row['view_count'],
        row['like_count'],
        row['dislike_count'],
        row['sentiment_score'],
        row['holidays'],
        row['clashes'],
    ]
    predict.append(a)

X = np.array(features).astype(float)
Y = np.array(response).astype(float)
predict = np.array(predict).astype(float)

svm …
Run Code Online (Sandbox Code Playgroud)

python machine-learning standardized svm scikit-learn

4
推荐指数
1
解决办法
1913
查看次数