kwo*_*sin 6 python arrays numpy machine-learning cross-validation
我有一个尺寸为5000 x 3027(CIFAR-10数据集)的矩阵形式的训练数据集.在numpy中使用array_split,我将它分成5个不同的部分,我想只选择其中一个部分作为交叉验证折叠.然而,当我使用像XTrain [[Indexes]]之类的东西时,我的问题出现了,其中索引是像[0,1,2,3]这样的数组,因为这样做会给我一个尺寸为4 x 1000 x 3027的3D张量,而不是矩阵.如何将"4 x 1000"折叠成4000行,以获得4000 x 3027的矩阵?
for fold in range(len(X_train_folds)):
indexes = np.delete(np.arange(len(X_train_folds)), fold)
XTrain = X_train_folds[indexes]
X_cv = X_train_folds[fold]
yTrain = y_train_folds[indexes]
y_cv = y_train_folds[fold]
classifier.train(XTrain, yTrain)
dists = classifier.compute_distances_no_loops(X_cv)
y_test_pred = classifier.predict_labels(dists, k)
num_correct = np.sum(y_test_pred == y_test)
accuracy = float(num_correct/num_test)
k_to_accuracy[k] = accuracy
Run Code Online (Sandbox Code Playgroud)
也许您可以试试看(numpy的新手,所以如果我做的事情效率低下/错误,将很乐意予以纠正)
X_train_folds = np.array_split(X_train, num_folds)
y_train_folds = np.array_split(y_train, num_folds)
k_to_accuracies = {}
for k in k_choices:
k_to_accuracies[k] = []
for i in range(num_folds):
training_data, test_data = np.concatenate(X_train_folds[:i] + X_train_folds[i+1:]), X_train_folds[i]
training_labels, test_labels = np.concatenate(y_train_folds[:i] + y_train_folds[i+1:]), y_train_folds[i]
classifier.train(training_data, training_labels)
predicted_labels = classifier.predict(test_data, k)
k_to_accuracies[k].append(np.sum(predicted_labels == test_labels)/len(test_labels))
Run Code Online (Sandbox Code Playgroud)
我建议使用scikit-learn包。它已经配备了大量常见的机器学习工具,例如K 折交叉验证生成器:
>>> from sklearn.cross_validation import KFold
>>> X = # your data [samples x features]
>>> y = # gt labels
>>> kf = KFold(X.shape[0], n_folds=5)
Run Code Online (Sandbox Code Playgroud)
然后,迭代kf:
>>> for train_index, test_index in kf:
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
# do something
Run Code Online (Sandbox Code Playgroud)
上述循环将被执行n_folds多次,每次都有不同的训练和测试指标。
| 归档时间: |
|
| 查看次数: |
1739 次 |
| 最近记录: |