当我对m类进行n次交叉验证时,在每个折叠中,列车和测试装置是否平衡?通过平衡,我的意思是询问在列车和测试集中是否存在(几乎)来自每个类的相同样本集.
如何使用Weka Api通过10倍交叉验证制作分类模型?我问这个问题,因为每次交叉验证运行都会创建一个新的分类模型.我应该在我的测试数据中使用哪种分类模型?
谢谢!!
我正在尝试在 python 上使用 xgboost。这是我的代码。xgb.train工作,但我收到一个错误xgb.cv,尽管我似乎以正确的方式使用它。
以下对我有用:
###### XGBOOST ######
import datetime
startTime = datetime.datetime.now()
import xgboost as xgb
data_train = np.array(traindata.drop('Category',axis=1))
labels_train = np.array(traindata['Category'].cat.codes)
data_valid = np.array(validdata.drop('Category',axis=1))
labels_valid = np.array(validdata['Category'].astype('category').cat.codes)
weights_train = np.ones(len(labels_train))
weights_valid = np.ones(len(labels_valid ))
dtrain = xgb.DMatrix( data_train, label=labels_train,weight = weights_train)
dvalid = xgb.DMatrix( data_valid , label=labels_valid ,weight = weights_valid )
param = {'bst:max_depth':5, 'bst:eta':0.05, # eta [default=0.3]
#'min_child_weight':1,'gamma':0,'subsample':1,'colsample_bytree':1,'scale_pos_weight':0, # default
# max_delta_step:0 # default
'min_child_weight':5,'scale_pos_weight':0, 'max_delta_step':2,
'subsample':0.8,'colsample_bytree':0.8,
'silent':1, 'objective':'multi:softprob' }
param['nthread'] = 4 …Run Code Online (Sandbox Code Playgroud) 我打算尝试此链接中的代码:
我从指向的行中得到了错误StratifiedKFold(n_splits=60)。谁能告诉我如何解决这个错误?
这是代码:
import numpy as np
from scipy import interp
import matplotlib.pyplot as plt
from itertools import cycle
from sklearn import svm, datasets
from sklearn.metrics import roc_curve, auc
from sklearn.cross_validation import StratifiedKFold
iris = datasets.load_iris()
X = iris.data
y = iris.target
X, y = X[y != 2], y
X, y
cv = StratifiedKFold(n_splits=6)
classifier = svm.SVC(kernel='linear', probability=True,
random_state=random_state)
mean_tpr = 0.0
mean_fpr = np.linspace(0, 1, 100)
Run Code Online (Sandbox Code Playgroud)
这是错误:
TypeError Traceback (most recent call last)
<ipython-input-227-2af2773f4987> in …Run Code Online (Sandbox Code Playgroud) 我有一个像这样的数据框:
Col1 Col2
10 1 6
11 3 8
12 9 4
13 7 2
14 4 3
15 2 9
16 6 7
17 8 1
18 5 5
Run Code Online (Sandbox Code Playgroud)
我想使用 KFold 交叉验证来拟合我的模型并进行预测。
for train_index, test_index in kf.split(X_train, y_train):
model.fit(X[train_index], y[train_index])
y_pred = model.predict(X[test_index])
Run Code Online (Sandbox Code Playgroud)
此代码生成以下错误:
'[1 2 4 7] 不在索引中'
我看到在 KFold.split() 之后,train_index 和 test_index 不使用数据帧的真实索引号。
所以我无法适应我的模型。
有人有主意吗?
尝试使用交叉折叠重采样并拟合 Ranger 包中的随机森林。无需重新采样的拟合工作正常,但一旦我尝试重新采样拟合,它就会失败并出现以下错误。
考虑以下df
df<-structure(list(a = c(1379405931, 732812609, 18614430, 1961678341,
2362202769, 55687714, 72044715, 236503454, 61988734, 2524712675,
98081131, 1366513385, 48203585, 697397991, 28132854), b = structure(c(1L,
6L, 2L, 5L, 7L, 8L, 8L, 1L, 3L, 4L, 3L, 5L, 7L, 2L, 2L), .Label = c("CA",
"IA", "IL", "LA", "MA", "MN", "TX", "WI"), class = "factor"),
c = structure(c(2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L,
2L, 2L, 2L, 1L), .Label = c("R", "U"), class = "factor"),
d = structure(c(3L, 3L, …Run Code Online (Sandbox Code Playgroud) 有没有办法用 CatBoost 和 Optuna 进行修剪(在 LightGBM 中很容易,但在 Catboost 中我找不到任何提示)。我的代码是这样的
def objective(trial):
param = {
'iterations':trial.suggest_int('iterations', 100,1500, step=100),
'learning_rate':trial.suggest_uniform("learning_rate", 0.001, 0.3),
'random_strength':trial.suggest_int("random_strength", 1,10),
'max_bin':trial.suggest_categorical('max_bin', [2,3,4,5,6,8,10,20,30]),
'grow_policy':trial.suggest_categorical('grow_policy', ['SymmetricTree', 'Depthwise', 'Lossguide']),
"colsample_bylevel": trial.suggest_uniform("colsample_bylevel", 0.1, 1),
'od_type' : "Iter",
'od_wait' : 30,
"depth": trial.suggest_int("max_depth", 1,12),
"l2_leaf_reg": trial.suggest_loguniform("l2_leaf_reg", 1e-8, 100),
'custom_metric' : ['AUC'],
"loss_function": "Logloss",
}
if param['grow_policy'] == "SymmetricTree":
param["boosting_type"]= trial.suggest_categorical("boosting_type", ["Ordered", "Plain"])
else:
param["boosting_type"] = "Plain"
# Added subsample manually
param["subsample"] = trial.suggest_float("subsample", 0.1, 1)
### CV ###
# How to add a …Run Code Online (Sandbox Code Playgroud) python ×4
weka ×2
catboost ×1
data-fitting ×1
optuna ×1
pandas ×1
pruning ×1
r ×1
r-ranger ×1
scikit-learn ×1
tidymodels ×1
xgboost ×1