我想使用 BertForMaskedLM 或 BertModel 来计算句子的困惑度,所以我编写了这样的代码:
\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom transformers import BertTokenizer, BertForMaskedLM\n# Load pre-trained model (weights)\nwith torch.no_grad():\n model = BertForMaskedLM.from_pretrained(\'hfl/chinese-bert-wwm-ext\')\n model.eval()\n # Load pre-trained model tokenizer (vocabulary)\n tokenizer = BertTokenizer.from_pretrained(\'hfl/chinese-bert-wwm-ext\')\n sentence = "\xe6\x88\x91\xe4\xb8\x8d\xe4\xbc\x9a\xe5\xbf\x98\xe8\xae\xb0\xe5\x92\x8c\xe4\xbd\xa0\xe4\xb8\x80\xe8\xb5\xb7\xe5\xa5\x8b\xe6\x96\x97\xe7\x9a\x84\xe6\x97\xb6\xe5\x85\x89\xe3\x80\x82"\n tokenize_input = tokenizer.tokenize(sentence)\n tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])\n sen_len = len(tokenize_input)\n sentence_loss = 0.\n\n for i, word in enumerate(tokenize_input):\n # add mask to i-th character of the sentence\n tokenize_input[i] = \'[MASK]\'\n mask_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])\n\n output = model(mask_input)\n\n prediction_scores = output[0]\n softmax = nn.Softmax(dim=0)\n …Run Code Online (Sandbox Code Playgroud) nlp transformer-model pytorch bert-language-model huggingface-transformers