wtw*_*twt 5 translation nlp machine-translation huggingface-transformers huggingface-tokenizers
我想使用 HuggingFace 的转换器使用预训练"xlm-mlm-xnli15-1024"模型将中文翻译成英文。本教程展示了如何从英语到德语。
我尝试按照教程进行操作,但它没有详细说明如何手动更改语言或解码结果。我不知道从哪里开始。抱歉,这个问题不能更具体了。
\n这是我尝试过的:
\nfrom transformers import AutoModelWithLMHead, AutoTokenizer\nbase_model = "xlm-mlm-xnli15-1024"\nmodel = AutoModelWithLMHead.from_pretrained(base_model)\ntokenizer = AutoTokenizer.from_pretrained(base_model)\n\ninputs = tokenizer.encode("translate English to Chinese: Hugging Face is a technology company based in New York and Paris", return_tensors="pt")\noutputs = model.generate(inputs, max_length=40, num_beams=4, early_stopping=True)\n\nprint(tokenizer.decode(outputs.tolist()[0]))\nRun Code Online (Sandbox Code Playgroud)\n\'<s>translate english to chinese : hugging face is a technology company based in new york and paris </s>china hug \xe2\x84\xa2 \xe2\x84\xa2 \xe2\x84\xa2 \xe2\x84\xa2 \xe2\x84\xa2 \xe2\x84\xa2 \xe2\x84\xa2 \xe2\x84\xa2 \xe2\x84\xa2 \xe2\x84\xa2 \xe2\x84\xa2 \xe2\x84\xa2 \xe2\x84\xa2 \xe2\x84\xa2 \xe2\x84\xa2 \xe2\x84\xa2 \xe2\x84\xa2\'\nRun Code Online (Sandbox Code Playgroud)\n
这可能会有所帮助。https://huggingface.co/Helsinki-NLP/opus-mt-zh-en
\nimport transformers\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\ntokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-zh-en")\nmodel = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-zh-en")\ntext =\'\xe5\xa4\xae\xe8\xa7\x86\xe6\x98\xa5\xe6\x99\x9a\xef\xbc\x8c\xe6\xb2\xa1\xe6\x9c\x89\xe6\x9c\x80\xe7\x83\x82\xef\xbc\x8c\xe5\x8f\xaa\xe6\x9c\x89\xe6\x9b\xb4\xe7\x83\x82\'\ntokenized_text = tokenizer.prepare_seq2seq_batch([text], return_tensors=\'pt\')\ntranslation = model.generate(**tokenized_text)\ntranslated_text = tokenizer.batch_decode(translation, skip_special_tokens=False)[0]\nRun Code Online (Sandbox Code Playgroud)\n