Huggingface TFBertForSequenceClassification 始终预测相同的标签

alx*_*gal 4 python tensorflow bert-language-model huggingface-transformers

TL;DR:\n我的模型总是预测相同的标签,但我不知道为什么。下面是我的微调代码,希望有人能指出我哪里出错了。

\n

我使用 Huggingface 的 TFBertForSequenceClassification 进行序列分类任务来预测德语文本中句子的 4 个标签。

\n

我使用 bert-base-german-cased 模型,因为我不只使用小写文本(因为德语比英语更区分大小写)。

\n

我从一个 csv 文件中获取输入,该文件是根据收到的带注释的语料库构建的。这是其中的一个示例:

\n
0       Hier kommen wir ins Spiel Die App Cognitive At...\n1       Doch wenn Athlet Lebron James jede einzelne Mu...\n2       Wie kann ein Gehirn auf Hochleistung getrimmt ...\n3       Wie schafft es Warren Buffett knapp 1000 W\xc3\xb6rte...\n4       Entfalte dein mentales Potenzial und werde ein...\nName: sentence_clean, Length: 3094, dtype: object\n
Run Code Online (Sandbox Code Playgroud)\n

这些是我的标签,来自同一个 csv 文件:

\n
0       e_1\n1       e_4\n2       e_4\n3       e_4\n4       e_4\n
Run Code Online (Sandbox Code Playgroud)\n

不同的标签是:e_1、e_2、e_3 和 e_4

\n

这是我用来微调模型的代码:

\n
import pandas as pd\nimport numpy as np\nimport os\n    \n# read in data\n# sentences_df = pd.read_csv('path/file.csv')\n\n\nX = sentences_df.sentence_clean\nY = sentences_df.classId\n\n# =============================================================================\n# One hot encode labels\n# =============================================================================\n\n# integer encode labels\nfrom numpy import array\nfrom numpy import argmax\nfrom sklearn.preprocessing import LabelEncoder\n\n\nlabel_encoder = LabelEncoder()\nY_integer_encoded = label_encoder.fit_transform(list(Y))\n\n\n# one hot encode labels\nfrom sklearn.preprocessing import OneHotEncoder\n\nonehot_encoder = OneHotEncoder(sparse=False)\nY_integer_encoded_reshaped = Y_integer_encoded.reshape(len(Y_integer_encoded), 1)\nY_one_hot_encoded = onehot_encoder.fit_transform(Y_integer_encoded_reshaped)\n\n# train test split\nfrom sklearn.model_selection import train_test_split\n\n\nX_train_raw, X_test_raw, y_train, y_test = train_test_split(X, Y_one_hot_encoded, test_size=0.20, random_state=42)\n\n\n# =============================================================================\n# Perpare datasets for finetuning\n# =============================================================================\nimport tensorflow as tf\nphysical_devices = tf.config.list_physical_devices('GPU') \ntf.config.experimental.set_memory_growth(physical_devices[0], True)\n\nfrom transformers import BertTokenizer, TFBertForSequenceClassification\n\n\ntokenizer = BertTokenizer.from_pretrained('bert-base-german-cased') # initialize tokenizer\n\n\n# tokenize trai and test sets\nmax_seq_length = 128\n\nX_train_tokens = tokenizer(list(X_train_raw),\n                            truncation=True,\n                            padding=True)\n\nX_test_tokens = tokenizer(list(X_test_raw),\n                            truncation=True,\n                            padding=True)\n\n\n# create TF datasets as input for BERT model\nbert_train_ds = tf.data.Dataset.from_tensor_slices((\n    dict(X_train_tokens),\n    y_train\n))\n\nbert_test_ds = tf.data.Dataset.from_tensor_slices((\n    dict(X_test_tokens),\n    y_test\n))\n\n# =============================================================================\n# setup model and finetune\n# =============================================================================\n\n# define hyperparams\nnum_labels = 4\nlearninge_rate = 2e-5\nepochs = 3\nbatch_size = 16\n\n# create BERT model\nbert_categorical_partial = TFBertForSequenceClassification.from_pretrained('bert-base-german-cased', num_labels=num_labels)\n\noptimizer = tf.keras.optimizers.Adam(learning_rate=learninge_rate)\nbert_categorical_partial.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])\n\nhistory = bert_categorical_partial.fit(bert_train_ds.shuffle(100).batch(batch_size),\n          epochs=epochs,\n          # batch_size=batch_size,\n          validation_data=bert_test_ds.shuffle(100).batch(batch_size))\n
Run Code Online (Sandbox Code Playgroud)\n

这是微调的输出:

\n
Epoch 1/3\n155/155 [==============================] - 31s 198ms/step - loss: 8.3038 - accuracy: 0.2990 - val_loss: 8.7751 - val_accuracy: 0.2811\nEpoch 2/3\n155/155 [==============================] - 30s 196ms/step - loss: 8.2451 - accuracy: 0.2913 - val_loss: 8.9314 - val_accuracy: 0.2779\nEpoch 3/3\n155/155 [==============================] - 30s 196ms/step - loss: 8.3101 - accuracy: 0.2913 - val_loss: 9.0355 - val_accuracy: 0.2746\n
Run Code Online (Sandbox Code Playgroud)\n

最后,我尝试预测测试集的标签并使用混淆矩阵验证结果:

\n
X_test_tokens_new = {'input_ids': np.asarray(X_test_tokens['input_ids']),\n                     'token_type_ids': np.asarray(X_test_tokens['token_type_ids']),\n                     'attention_mask': np.asarray(X_test_tokens['attention_mask']),\n                     }\n\npred_raw = bert_categorical_partial.predict(X_test_tokens_new)\npred_proba = tf.nn.softmax(pred_raw).numpy()\npred = pred_proba[0].argmax(axis = 1)\ny_true = y_test.argmax(axis = 1)\n\ncm = confusion_matrix(y_true, pred)\n
Run Code Online (Sandbox Code Playgroud)\n

打印输出(cm):

\n
array([[  0,   0,   0,  41],\n       [  2,   0,   0, 253],\n       [  2,   0,   0, 219],\n       [  6,   0,   0,  96]], dtype=int64)\n
Run Code Online (Sandbox Code Playgroud)\n

正如你所看到的,我的准确性非常差,当我查看 cm 时,我可以看到我的模型几乎只预测一个标签。\n我已经尝试了所有方法并多次运行模型,但我总是得到相同的结果。\n我确实知道我正在使用的数据不是很好,而且我只对大约 2k 个带有标签的句子进行训练。但我有一种感觉,准确度应该更高,更重要的是,该模型不应该在 98% 的时间内只预测一个标签,对吗?

\n

我发布了我用来运行模型的所有内容,希望有人能指出我出错的地方。\n提前非常感谢您的帮助!

\n

And*_*rey 5

您训练了几分钟。即使对于预训练的 BERT 来说这也是不够的。

尝试降低学习率,以便在每个 epoch 后(前 10 个 epoch)提高准确性。并训练更多的 epoch(直到您看到验证准确度在 10 个 epoch 中下降)。