我将通过此链接来了解用于文本分类的多通道CNN模型。
该代码基于本教程。
我已经了解了大多数事情,但是我不明白Keras如何定义某些图层的输出形状。
这是代码:
定义一个具有三个输入通道的模型,以处理4克,6克和8克电影评论文本。
#Skipped keras imports
# load a clean dataset
def load_dataset(filename):
return load(open(filename, 'rb'))
# fit a tokenizer
def create_tokenizer(lines):
tokenizer = Tokenizer()
tokenizer.fit_on_texts(lines)
return tokenizer
# calculate the maximum document length
def max_length(lines):
return max([len(s.split()) for s in lines])
# encode a list of lines
def encode_text(tokenizer, lines, length):
# integer encode
encoded = tokenizer.texts_to_sequences(lines)
# pad encoded sequences
padded = pad_sequences(encoded, maxlen=length, padding='post')
return padded
# define the model
def define_model(length, vocab_size):
# …
Run Code Online (Sandbox Code Playgroud) python nlp machine-learning deep-learning conv-neural-network
我正在尝试在GradientBoostingClassifier()
gridsearchcv 的帮助下运行。对于每个参数组合,我还需要表格格式的“精确度”、“召回率”和准确性。
这是代码:
scoring= ['accuracy', 'precision','recall']
parameters = {#'nthread':[3,4], #when use hyperthread, xgboost may become slower
"criterion": ["friedman_mse", "mae"],
"loss":["deviance","exponential"],
"max_features":["log2","sqrt"],
'learning_rate': [0.01,0.05,0.1,1,0.5], #so called `eta` value
'max_depth': [3,4,5],
'min_samples_leaf': [4,5,6],
'subsample': [0.6,0.7,0.8],
'n_estimators': [5,10,15,20],#number of trees, change it to 1000 for better results
'scoring':scoring
}
# sorted(sklearn.metrics.SCORERS.keys()) # To see different loss functions
#clf_xgb = GridSearchCV(xgb_model, parameters, n_jobs=5,verbose=2, refit=True,cv = 8)
clf_gbm = GridSearchCV(gbm_model, parameters, n_jobs=5,cv = 8)
clf_gbm.fit(X_train,y_train)
print(clf_gbm.best_params_)
print(clf_gbm.best_score_)
feature_importances = pd.DataFrame(clf_gbm.best_estimator_.feature_importances_,
index = X_train.columns, …
Run Code Online (Sandbox Code Playgroud) 我用来sklearn.tree.DecisionTreeClassifier
训练三类分类问题。
3类中的记录数如下:
A: 122038
B: 43626
C: 6678
Run Code Online (Sandbox Code Playgroud)
当我训练分类器模型时,它无法学习类 - C
。虽然效率为65-70%,但完全忽略了C类。
然后我开始了解class_weight
参数,但我不确定如何在多类设置中使用它。
这是我的代码:(我使用过balanced
,但它的准确性更差)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
clf = tree.DecisionTreeClassifier(criterion="gini", max_depth=3, random_state=1,class_weight='balanced')
clf = clf.fit(X_train,y_train)
y_pred = clf.predict(X_test)
Run Code Online (Sandbox Code Playgroud)
如何使用与类别分布成比例的权重。
其次,有没有更好的方法来解决这个不平衡类问题以提高准确性?
我正在使用拥抱面部模型训练多标签分类问题。我正在使用 Pytorch Lightning 来训练模型。
这是代码:
当损失最后没有改善时,就会触发提前停止
early_stopping_callback = EarlyStopping(monitor='val_loss', patience=2)
Run Code Online (Sandbox Code Playgroud)
我们可以开始训练过程:
checkpoint_callback = ModelCheckpoint(
dirpath="checkpoints",
filename="best-checkpoint",
save_top_k=1,
verbose=True,
monitor="val_loss",
mode="min"
)
trainer = pl.Trainer(
logger=logger,
callbacks=[early_stopping_callback],
max_epochs=N_EPOCHS,
checkpoint_callback=checkpoint_callback,
gpus=1,
progress_bar_refresh_rate=30
)
# checkpoint_callback=checkpoint_callback,
Run Code Online (Sandbox Code Playgroud)
一旦我运行这个,我就会得到这个错误:
~/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py in _configure_checkpoint_callbacks(self, checkpoint_callback)
75 if isinstance(checkpoint_callback, Callback):
76 error_msg += " Pass callback instances to the `callbacks` argument in the Trainer constructor instead."
---> 77 raise MisconfigurationException(error_msg)
78 if self._trainer_has_checkpoint_callbacks() and checkpoint_callback is False:
79 raise MisconfigurationException(
MisconfigurationException: Invalid type provided for checkpoint_callback: Expected bool …
Run Code Online (Sandbox Code Playgroud)