Isb*_*ter 3 scikit-learn keras tensorflow
目标是对具有多个输入的 Keras 模型执行交叉验证。这对于只有一个输入的正常顺序模型来说效果很好。然而,当使用函数式 api 并扩展到两个输入时,sklearnscross_val_score似乎没有按预期工作。
def create_model():
input_text = Input(shape=(1,), dtype=tf.string)
embedding = Lambda(UniversalEmbedding, output_shape=(512, ))(input_text)
dense = Dense(256, activation='relu')(embedding)
input_title = Input(shape=(1,), dtype=tf.string)
embedding_title = Lambda(UniversalEmbedding, output_shape=(512, ))(input_title)
dense_title = Dense(256, activation='relu')(embedding_title)
out = Concatenate()([dense, dense_title])
pred = Dense(2, activation='softmax')(out)
model = Model(inputs=[input_text, input_title], outputs=pred)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
Run Code Online (Sandbox Code Playgroud)
keras_classifier = KerasClassifier(build_fn=create_model, epochs=10, batch_size=10, verbose=1)
cv = StratifiedKFold(n_splits=10, random_state=0)
results = cross_val_score(keras_classifier, [X1, X2], y, cv=cv, scoring='f1_weighted')
Run Code Online (Sandbox Code Playgroud)
Traceback (most recent call last):
File "func.py", line 73, in <module>
results = cross_val_score(keras_classifier, [X1, X2], y, cv=cv, scoring='f1_weighted')
File "/home/timisb/.local/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 402, in cross_val_score
error_score=error_score)
File "/home/timisb/.local/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 225, in cross_validate
X, y, groups = indexable(X, y, groups)
File "/home/timisb/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 260, in indexable
check_consistent_length(*result)
File "/home/timisb/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 235, in check_consistent_length
" samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [2, 643]
Run Code Online (Sandbox Code Playgroud)
有没有人有替代方法或解决方案的建议?谢谢!
您可以运行自己的交叉验证实现。示例 CV 实现可能如下所示:
import numpy as np
from sklearn.model_selection import StratifiedKFold
input_1 = [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]]
input_2 = [[11], [12], [13], [14], [15], [16], [17], [18], [19], [20]]
Y = [[0], [0], [0], [2], [2], [0], [1], [1], [2], [0]]
# Split a dataset into k folds
def cross_validation_split(X1, X2, Y, folds=4):
skf = StratifiedKFold(n_splits=4, shuffle = True)
skf.get_n_splits(X1, Y)
dataset_split = []
i = 0
for train_index, test_index in skf.split(X1, Y):
print("TRAIN:", train_index, "TEST:", test_index)
train_index = train_index.astype(int)
test_index = test_index.astype(int)
X1 = np.array(X1)
X2 = np.array(X2)
Y = np.array(Y)
X_1_train, X_1_test = X1[train_index], X1[test_index]
X_2_train, X_2_test = X2[train_index], X2[test_index]
y_train, y_test = Y[train_index], Y[test_index]
k_fold_set = {
'k_fold': i,
'train': {'X_1': X_1_train, 'X_2': X_2_train, 'Y': y_train},
'test': {'X_1': X_1_test, 'X_2': X_2_test, 'Y': y_test}
}
dataset_split.append(k_fold_set)
i = i + 1
return dataset_split
result = cross_validation_split(input_1, input_2, Y, folds=4)
Run Code Online (Sandbox Code Playgroud)
然后,只需循环创建的result列表并执行训练/验证逻辑,并将结果保存到一个列表中,该列表将为您提供 k 倍交叉验证的结果。
| 归档时间: |
|
| 查看次数: |
1911 次 |
| 最近记录: |