使用 spacy v3 我应该在配置文件中更改哪个参数来解决 CUDA 内存不足问题？batch_size vs max_length vs batcher.size

Question

使用 spacy v3 我应该在配置文件中更改哪个参数来解决 CUDA 内存不足问题？batch_size vs max_length vs batcher.size

Mar*_*ien 5 machine-learning spacy-transformers huggingface-transformers spacy-3

使用 spacy v3，我尝试使用camemBert 训练分类器，但遇到了CUDA out of memory问题。为了解决这个问题，我读到应该减小批量大小，但我很困惑应该更改哪个参数：

[nlp] 批量大小
[components.transformer] max_batch_items
[corpora.train 或 dev] max_length
[训练.批处理程序] 大小
[trainning.batcher] 缓冲区

我试图理解每个参数之间的区别：

[nlp] 批量大小

管道和评估的默认批量大小。默认为 1000。

培训/评估过程中是否使用了这些功能？
在快速启动小部件（https://spacy.io/usage/training#quickstart）中，为什么该值根据硬件而不同？CPU 为 1000，GPU 为 128。
训练过程中，这个值低的话评估会不会慢一些？

[components.transformer] max_batch_items

填充批次的最大尺寸。默认为 4096。

根据警告消息：Token indices sequence length is longer than the specified maximum sequence length for this model (556 > 512). Running this sequence through the model will result in indexing errors此处解释（https://github.com/explosion/spaCy/issues/6939），Camembert模型指定的最大序列长度为 512。

参数 max_batch_item 是否重载到该值？我应该将该值更改为 512 吗？

[corpora.train 或 dev] max_length

根据我的理解，这个值应该等于或小于最大序列长度。在快速入门小部件中，对于训练集，该值设置为 500，对于开发集，该值设置为 0。如果设置为0，是否会过载到变压器模型的最大序列长度？

[trainning.batcher] spacy.batch_by_padded.v1 的大小

用于批量序列的最大填充大小。也可以是引用时间表的块，例如复合。

如果我不使用复合，这个参数与 max_lentgh 有什么不同？

这是我的配置文件的一些部分

[nlp]
lang = "fr"
pipeline = ["transformer","textcat"]
# Default batch size to use with nlp.pipe and nlp.evaluate
batch_size = 256
...

[components.transformer]
factory = "transformer"
# Maximum size of a padded batch. Defaults to 4096.
max_batch_items = 4096
...

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
# Limitations on training document length
max_length = 512
...

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
# The largest padded size to batch sequences into. Can also be a block referencing a schedule, e.g. compounding.
size = 500
# The number of sequences to accumulate before sorting by length. A larger buffer will result in more even sizing, but if the buffer is very large, the iteration order will be less random, which can result in suboptimal training.
buffer = 128
get_length = null
...

Run Code Online (Sandbox Code Playgroud)

Answer 1

mbr*_*cky 1

您的 GPU 有多少内存？

在 Spacy 2.x 下，我能够使用 6GB GPU。但是（如果我没记错的话）Spacy 3 文档建议 10-12 GB。我尝试了各种参数，但我的 GPU 6GB 内存大部分被 PyTorch 负载用完，因此无论批量大小调整如何，我很快就会“用完 GPU 内存”。这不仅适用于 Transformer，也适用于普通的 NR EntityRecognizer - Spacy 3 只是向 GPU 加载比 Spacy 2 过去更多的“东西”。

归档时间：	4 年，8 月前
查看次数：	1579 次
最近记录：	4 年，1 月前