假设我们有:
test = [['word.II', 123, 234],
['word.IV', 321, 123],
['word.XX', 345, 345],
['word.XIV', 345, 432]
]
Run Code Online (Sandbox Code Playgroud)
如何拆分测试中的第一个元素,以便结果是:
test = [['word', 'II', 123, 234],
['word', 'IV', 321, 123],
['word', 'XX', 345, 345],
['word', 'XIV', 345, 432]
]
Run Code Online (Sandbox Code Playgroud)
我尝试过的其他事情包括:
test = [[row[0].split('.'), row[1], row[2]] for row in test],
Run Code Online (Sandbox Code Playgroud)
但这会导致:
[['word', 'II'], 123, 234]
[['word', 'IV'], 321, 123]
[['word', 'XX'], 345, 345]
[['word', 'XIV'], 345, 432]
Run Code Online (Sandbox Code Playgroud) 我训练/微调了一个西班牙 RoBERTa模型,该模型最近针对除文本分类之外的各种 NLP 任务进行了预训练。
由于基线模型似乎很有前途,因此我想针对不同的任务对其进行微调:文本分类,更准确地说,是对西班牙语推文的情感分析,并用它来预测我所抓取的推文上的标签。
预处理和训练似乎工作正常。但是,我不知道之后如何使用这种模式进行预测。
我将省略预处理部分,因为我认为这似乎不存在问题。
# Training with native TensorFlow
from transformers import TFAutoModelForSequenceClassification
## Model Definition
model = TFAutoModelForSequenceClassification.from_pretrained("BSC-TeMU/roberta-base-bne", from_pt=True, num_labels=3)
## Model Compilation
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.metrics.SparseCategoricalAccuracy()
model.compile(optimizer=optimizer,
loss=loss,
metrics=metric)
## Fitting the data
history = model.fit(train_dataset.shuffle(1000).batch(64), epochs=3, batch_size=64)
Run Code Online (Sandbox Code Playgroud)
/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py:337: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, …Run Code Online (Sandbox Code Playgroud) nlp keras tensorflow transfer-learning huggingface-transformers