小编Syc*_*ica的帖子

如何嵌套LabelKFold?

我有一个包含约300个点和32个不同标签的数据集,我想通过使用网格搜索和LabelKFold验证绘制其学习曲线来评估LinearSVR模型.

我的代码看起来像这样:

import numpy as np
from sklearn import preprocessing
from sklearn.svm import LinearSVR
from sklearn.pipeline import Pipeline
from sklearn.cross_validation import LabelKFold
from sklearn.grid_search import GridSearchCV
from sklearn.learning_curve import learning_curve
    ...
#get data (x, y, labels)
    ...
C_space = np.logspace(-3, 3, 10)
epsilon_space = np.logspace(-3, 3, 10)  

svr_estimator = Pipeline([
    ("scale", preprocessing.StandardScaler()),
    ("svr", LinearSVR),
])

search_params = dict(
    svr__C = C_space,
    svr__epsilon = epsilon_space
)

kfold = LabelKFold(labels, 5)

svr_search = GridSearchCV(svr_estimator, param_grid = search_params, cv = ???)

train_space = …
Run Code Online (Sandbox Code Playgroud)

python scikit-learn cross-validation

13
推荐指数
1
解决办法
430
查看次数

sklearn GridSearchCV在分数函数中不使用sample_weight

我有每个样本不同权重的数据.在我的应用中,重要的是在估计模型和比较替代模型时考虑这些权重.

sklearn用来估计模型并比较替代的超参数选择.但是这个单元测试显示GridSearchCV不适sample_weights用于估计分数.

有没有办法有sklearn使用sample_weight得分模式?

单元测试:

from __future__ import division

import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import log_loss
from sklearn.model_selection import GridSearchCV, RepeatedKFold


def grid_cv(X_in, y_in, w_in, cv, max_features_grid, use_weighting):
  out_results = dict()

  for k in max_features_grid:
    clf = RandomForestClassifier(n_estimators=256,
                                 criterion="entropy",
                                 warm_start=False,
                                 n_jobs=-1,
                                 random_state=RANDOM_STATE,
                                 max_features=k)
    for train_ndx, test_ndx in cv.split(X=X_in, y=y_in):
      X_train = X_in[train_ndx, :]
      y_train = y_in[train_ndx]
      w_train = w_in[train_ndx]
      y_test = y[test_ndx] …
Run Code Online (Sandbox Code Playgroud)

python machine-learning scikit-learn

7
推荐指数
1
解决办法
1477
查看次数

带有data.table的setkeyv中的异常错误

我对自己非常熟悉data.table,但在setkeyv功能上遇到了一个我无法解决的奇怪错误.

错误非常简单:

keycols<-c("A", "B")
DT <- data.table(A=1:10, B=91:90)
setkeyv(DT, keycols)
# Error in setkeyv(x, cols, verbose = verbose, physical = physical) : 
#   4 arguments passed to .Internal(nchar) which requires 3
Run Code Online (Sandbox Code Playgroud)

作为参考,这是我的sessionInfo():

R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached …
Run Code Online (Sandbox Code Playgroud)

r data.table

6
推荐指数
1
解决办法
4465
查看次数

Kernlab 疯狂:相同问题的结果不一致

我在 kernlab 包中发现了一些令人费解的行为:估计数学上相同的 SVM 在软件中会产生不同的结果。

为简单起见,此代码片段仅采用虹膜数据并使其成为二元分类问题。如您所见,我在两个 SVM 中都使用了线性内核。

library(kernlab)
library(e1071)

data(iris)
x <- as.matrix(iris[, 1:4])
y <- as.factor(ifelse(iris[, 5] == 'versicolor', 1, -1))
C <- 5.278031643091578

svm1 <- ksvm(x = x, y = y, scaled = FALSE, kernel = 'vanilladot', C = C)

K <- kernelMatrix(vanilladot(), x)
svm2 <- ksvm(x = K, y = y, C = C, kernel = 'matrix')

svm3 <- svm(x = x, y = y, scale = FALSE, kernel = 'linear', cost = C)
Run Code Online (Sandbox Code Playgroud)

但是,svm1 和 …

r machine-learning svm kernlab

5
推荐指数
1
解决办法
476
查看次数

检查字符串是否是以逗号分隔的数字列表

在我的table1中我有varchar字段,其中我存储其他table2的id-list(id-INT UNSIGNED AUTOINCREMENT),用逗号分隔.

例如:1,3,5,12,90

也不应该重复ID.

我需要检查字符串(来自外部)是否符合此规则.例如,我需要检查$ _POST ['id_list']

数据一致性现在并不重要(例如,插入此值而不检查table2中是否确实存在此ID)

任何建议都会有所帮助.

php mysql string

2
推荐指数
1
解决办法
1127
查看次数