如何使用Python计算BLEU分数？

2 年 ago

文, 翔

2 minutes

在Python中，Bleu分数是一种用于衡量机器翻译模型优劣的指标。尽管最初它只设计用于翻译模型，但现在它也被用于其他自然语言处理应用。

BLEU分数是将一个句子与一个或多个参考句子进行比较，告诉我们候选句子与参考句子列表的匹配程度。它给出一个介于0到1之间的分数。

BLEU得分为1表示候选句子与参考句子中的一个完全匹配。

这个分数是对图像描述模型常见的度量方法。

在本教程中，我们将使用nltk库中的sentence_bleu()函数。让我们开始吧。

在Python中计算Bleu分数

为了计算Bleu分数，我们需要以token的形式提供参考句子和候选句子。

在本节中，我们将学习如何做到这一点并计算得分。让我们从导入必要的模块开始。

from nltk.translate.bleu_score import sentence_bleu

现在我们可以以列表形式输入参考句子。在将句子传递到sentence_bleu()函数之前，还需要将其转换为标记。

1. 输入并拆分句子

我们的参考文献清单中的句子为：

    'this is a dog'
    'it is dog
    'dog it is'
    'a dog, it is'

我们可以使用分割函数将它们分成几个片段。

reference = [
    'this is a dog'.split(),
    'it is dog'.split(),
    'dog it is'.split(),
    'a dog, it is'.split() 
]
print(reference)

产出：输出

[['this', 'is', 'a', 'dog'], ['it', 'is', 'dog'], ['dog', 'it', 'is'], ['a', 'dog,', 'it', 'is']]

这是句子以标记形式的样子。现在我们可以调用sentence_bleu（）函数来计算得分。

在Python中计算BLEU分数。

使用下面的代码来计算分数。

candidate = 'it is dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

输出：

BLEU score -> 1.0

我们给这句话打出了满分1分，因为候选句子属于参考集合。我们再试一句。

candidate = 'it is a dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

输出：

BLEU score -> 0.8408964152537145

我们在参考集中有这个句子，但它并不完全匹配。这就是为什么我们得到了0.84分的原因。

3. Python完整代码用于实现BLEU分数。

这是这一部分的完整代码。

from nltk.translate.bleu_score import sentence_bleu
reference = [
    'this is a dog'.split(),
    'it is dog'.split(),
    'dog it is'.split(),
    'a dog, it is'.split() 
]
candidate = 'it is dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate )))

candidate = 'it is a dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

4. 计算n-gram得分

在匹配句子时，您可以选择模型一次匹配的单词数量。例如，您可以选择一次匹配一个单词（1-gram）。或者，您也可以选择一次匹配单词对（2-gram）或三个单词（3-gram）。

在本节中，我们将学习如何计算这些n-gram得分。

在sentence_bleu()函数中，您可以传递一个参数，其中包含与各个n-gram对应的权重。

例如，要单独计算克分数，您可以使用以下权重。

Individual 1-gram: (1, 0, 0, 0)
Individual 2-gram: (0, 1, 0, 0). 
Individual 3-gram: (1, 0, 1, 0). 
Individual 4-gram: (0, 0, 0, 1).

以下是相同功能的Python代码：

以下是对应的Python代码：

from nltk.translate.bleu_score import sentence_bleu
reference = [
    'this is a dog'.split(),
    'it is dog'.split(),
    'dog it is'.split(),
    'a dog, it is'.split() 
]
candidate = 'it is a dog'.split()

print('Individual 1-gram: %f' % sentence_bleu(reference, candidate, weights=(1, 0, 0, 0)))
print('Individual 2-gram: %f' % sentence_bleu(reference, candidate, weights=(0, 1, 0, 0)))
print('Individual 3-gram: %f' % sentence_bleu(reference, candidate, weights=(0, 0, 1, 0)))
print('Individual 4-gram: %f' % sentence_bleu(reference, candidate, weights=(0, 0, 0, 1)))

输出：

Individual 1-gram: 1.000000
Individual 2-gram: 1.000000
Individual 3-gram: 0.500000
Individual 4-gram: 1.000000

默认情况下，sentence_bleu()函数计算累积4-gram BLEU分数，也称为BLEU-4。BLEU-4的权重如下所示。

(0.25, 0.25, 0.25, 0.25)

我们来看一下BLEU-4代码：

score = sentence_bleu(reference, candidate, weights=(0.25, 0.25, 0.25, 0.25))
print(score)

以下是中文的本地语序选项：

产出：

0.8408964152537145

这就是我们没有加入n-gram权重时得到的准确分数。

结论

这个教程是关于在Python中计算BLEU分数的。我们学习了它是什么，以及如何计算单个和累积的n-gram Bleu分数。希望您在与我们学习时过得愉快！