Lon*_*inh 11 python nlp machine-learning nltk bleu
我在python中导入了nltk来计算Ubuntu上的BLEU分数.我理解句子级BLEU分数是如何工作的,但我不明白语料库级BLEU分数是如何工作的.
以下是我的语料级BLEU分数代码:
import nltk
hypothesis = ['This', 'is', 'cat']
reference = ['This', 'is', 'a', 'cat']
BLEUscore = nltk.translate.bleu_score.corpus_bleu([reference], [hypothesis], weights = [1])
print(BLEUscore)
Run Code Online (Sandbox Code Playgroud)
出于某种原因,上述代码的bleu得分为0.我期待一个语料库级别的BLEU评分至少为0.5.
这是我的句子级BLEU分数的代码
import nltk
hypothesis = ['This', 'is', 'cat']
reference = ['This', 'is', 'a', 'cat']
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis, weights = [1])
print(BLEUscore)
Run Code Online (Sandbox Code Playgroud)
考虑到简短惩罚和缺失的单词"a",这里的句子级BLEU分数是0.71.但是,我不明白语料库级别的BLEU分数是如何工作的.
任何帮助,将不胜感激.
alv*_*vas 19
TL; DR:
>>> import nltk
>>> hypothesis = ['This', 'is', 'cat']
>>> reference = ['This', 'is', 'a', 'cat']
>>> references = [reference] # list of references for 1 sentence.
>>> list_of_references = [references] # list of references for all sentences in corpus.
>>> list_of_hypotheses = [hypothesis] # list of hypotheses that corresponds to list of references.
>>> nltk.translate.bleu_score.corpus_bleu(list_of_references, list_of_hypotheses)
0.6025286104785453
>>> nltk.translate.bleu_score.sentence_bleu(references, hypothesis)
0.6025286104785453
Run Code Online (Sandbox Code Playgroud)
(注意:您必须在develop分支上提取最新版本的NLTK 才能获得稳定版本的BLEU分数实现)
在龙:
其实只要有一个参考,在你的整个语料库一个假设,既corpus_bleu()与sentence_bleu()应返回相同的值显示在上面的例子.
在代码中,我们看到它sentence_bleu实际上是一种鸭子类型corpus_bleu:
def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),
smoothing_function=None):
return corpus_bleu([references], [hypothesis], weights, smoothing_function)
Run Code Online (Sandbox Code Playgroud)
如果我们查看以下参数sentence_bleu:
def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),
smoothing_function=None):
""""
:param references: reference sentences
:type references: list(list(str))
:param hypothesis: a hypothesis sentence
:type hypothesis: list(str)
:param weights: weights for unigrams, bigrams, trigrams and so on
:type weights: list(float)
:return: The sentence-level BLEU score.
:rtype: float
"""
Run Code Online (Sandbox Code Playgroud)
sentence_bleu引用的输入是a list(list(str)).
因此,如果您有一个句子字符串,例如"This is a cat",您必须将其标记为获取字符串列表,["This", "is", "a", "cat"]并且由于它允许多个引用,它必须是字符串列表的列表,例如,如果您有第二个引用,"这是猫科动物",你的输入sentence_bleu()将是:
references = [ ["This", "is", "a", "cat"], ["This", "is", "a", "feline"] ]
hypothesis = ["This", "is", "cat"]
sentence_bleu(references, hypothesis)
Run Code Online (Sandbox Code Playgroud)
当涉及到corpus_bleu()list_of_references参数时,它基本上是sentence_bleu()作为引用的任何内容的列表:
def corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25),
smoothing_function=None):
"""
:param references: a corpus of lists of reference sentences, w.r.t. hypotheses
:type references: list(list(list(str)))
:param hypotheses: a list of hypothesis sentences
:type hypotheses: list(list(str))
:param weights: weights for unigrams, bigrams, trigrams and so on
:type weights: list(float)
:return: The corpus-level BLEU score.
:rtype: float
"""
Run Code Online (Sandbox Code Playgroud)
除了查看doctest之外nltk/translate/bleu_score.py,您还可以查看unittest,nltk/test/unit/translate/test_bleu_score.py了解如何使用其中的每个组件bleu_score.py.
顺便说,由于sentence_bleu被导入为bleu在第(nltk.translate.__init__.py(https://github.com/nltk/nltk/blob/develop/nltk/translate/ INIT的.py#L21),采用
from nltk.translate import bleu
Run Code Online (Sandbox Code Playgroud)
会是这样的:
from nltk.translate.bleu_score import sentence_bleu
Run Code Online (Sandbox Code Playgroud)
并在代码中:
>>> from nltk.translate import bleu
>>> from nltk.translate.bleu_score import sentence_bleu
>>> from nltk.translate.bleu_score import corpus_bleu
>>> bleu == sentence_bleu
True
>>> bleu == corpus_bleu
False
Run Code Online (Sandbox Code Playgroud)
让我们来看看:
>>> help(nltk.translate.bleu_score.corpus_bleu)
Help on function corpus_bleu in module nltk.translate.bleu_score:
corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None)
Calculate a single corpus-level BLEU score (aka. system-level BLEU) for all
the hypotheses and their respective references.
Instead of averaging the sentence level BLEU scores (i.e. marco-average
precision), the original BLEU metric (Papineni et al. 2002) accounts for
the micro-average precision (i.e. summing the numerators and denominators
for each hypothesis-reference(s) pairs before the division).
...
Run Code Online (Sandbox Code Playgroud)
您比我更了解该算法的描述,所以我不会尝试向您“解释”它。如果文档字符串不足以清除所有内容,请查看源代码本身。或在本地找到它:
>>> nltk.translate.bleu_score.__file__
'.../lib/python3.4/site-packages/nltk/translate/bleu_score.py'
Run Code Online (Sandbox Code Playgroud)