BLEU 分数？我可以使用 nltk.translate.bleu_score.sentence_bleu 来计算中文的 bleu 分数吗？

Question

BLEU 分数？我可以使用 nltk.translate.bleu_score.sentence_bleu 来计算中文的 bleu 分数吗？

如果我有中文单词表：like reference = ['?'? '?', '?' ,'?']? 假设 = ['?', '?', '???'?'?] 。我可以使用：nltk.translate.bleu_score.sentence_bleu(references,假设)进行中文翻译吗？它和英语一样吗？日语怎么说？我的意思是如果我有像英语这样的单词表（中文和日文）。谢谢！

Answer 1

alv*_*vas 8

TL; 博士

是的。

在龙

BLEU 分数衡量 n-gram 及其对语言的不可知性，但它取决于语言句子可以拆分为标记的事实。所以是的，它可以比较中文/日文......

请注意在句子级别使用 BLEU 分数的注意事项。BLEU 从来没有考虑过句子级别的比较，这里有一个很好的讨论：https : //github.com/nltk/nltk/issues/1838

最有可能的是，当你有很短的句子时，你会看到警告，例如

>>> from nltk.translate import bleu
>>> ref = '? ? ? ?'.split()
>>> hyp = '? ? ??? ?'.split()
>>> bleu([ref], hyp)
/usr/local/lib/python2.7/site-packages/nltk/translate/bleu_score.py:490: UserWarning: 
Corpus/Sentence contains 0 counts of 3-gram overlaps.
BLEU scores might be undesirable; use SmoothingFunction().
  warnings.warn(_msg)
0.7071067811865475

Run Code Online (Sandbox Code Playgroud)

您可以使用https://github.com/alvations/nltk/blob/develop/nltk/translate/bleu_score.py#L425 中的平滑函数来克服短句。

>>> from nltk.translate.bleu_score import SmoothingFunction
>>> smoothie = SmoothingFunction().method4
>>> bleu([ref], hyp, smoothing_function=smoothie)
0.2866227639866161

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，4 月前
查看次数：	6952 次
最近记录：	8 年，4 月前