jus*_*rld 5 information-retrieval precision-recall
我正在尝试计算Oxford Building 图像数据集的平均精度(和平均平均精度)。
下面是他们提供的用于计算平均精度的代码。请注意,这pos_set是来自地面训练集的“最佳”和“好”图像的并集,而junk_set是一组不相关的图像。
void OxfordTest::computeAp(std::vector<std::string> &ranked_list){
float old_recall = 0.0;
float old_precision = 1.0;
float ap = 0.0;
size_t intersect_size = 0;
size_t i = 0;
size_t j = 0;
for ( ; i<ranked_list.size(); ++i) {
if(!pos_set.count(ranked_list[i]))
std::cin.get();
}
if (junk_set.count(ranked_list[i])) continue;
if (pos_set.count(ranked_list[i])) intersect_size++;
float recall = intersect_size / (float)pos_set.size();
float precision = intersect_size / (j + 1.0);
ap += (recall - old_recall)*((old_precision + precision)/2.0);
old_recall = recall;
old_precision = precision;
j++;
}
}
Run Code Online (Sandbox Code Playgroud)
这与链接的维基百科页面上给出的概念完全不同。这些概念之间有什么关联?
我非常确定维基百科的概念是正确的,因为它与这个答案和这篇文章中给出的一致。
我不明白为什么在上面的代码中报告了它:
The original paper states:
(3) Junk – less than 25% of the object
is visible, or there is a very high level of occlusion or distortion.
(4) Absent – the object is not present
Run Code Online (Sandbox Code Playgroud)
I.e. junk images are not negatives. There are positives (OK+Good), ignores (Junk) and negatives (Absent). Note that all these are per-query, i.e. some images are junk for query 1 but not for query 15. If you look at the images that are 'junk' you'll see ambiguous examples, e.g. some cases have extreme zoom or blurring which will make you think if this image contains the queried landmark or not, and cases where only a tiny part of the object is visible so the image is too hard.
In computing the average precision, we use the Good and
Ok images as positive examples of the landmark in question,
Absent images as negative examples and Junk images
as null examples. These null examples are treated as though
they are not present in the database – our score is unaffected
whether they are returned or not.
Run Code Online (Sandbox Code Playgroud)
So the authors defined the junk set to be neither positives nor negatives - the images most likely depict the queried object, but for some of them we are not sure, or it would be too harsh to treat them as positives and ask the system to retrieve these examples (and therefore penalise if it doesn't). At the same time, it would also be harsh to treat them as negatives as if the system does retrieve them, it shouldn't be penalised. So all that needs to be done is that (on a per-query basis) you ignore the junks and treat them as if they don't exist. So you take the retrieved list, filter out all junk images for this query, then run normal AP computation on this filtered list. That's what the code is doing effectively - when the example is in amb(=junk), it is just skipped. Then if the example is not in amb, if it's in pos(itives) the intersect_size (current number of positives up until position i) is incremented. The quantity j (well, j-1) is the number of non-skipped elements in the list (it gets incremented only if the current element is not junk).
You certainly need the recall in your AP computation, as explained by shiri in the previous answer, and as described in your article, p(r) is the precision at a particular recall. The best way to think of AP is not to examine a random formula but to understand what is the intuition and then see how the formula captures it, i.e. what wikipedia says at the start: you can plot precision as a function of recall, and AP is then simply the area under the curve. You want the precision to be high at all recalls, so the ideal curve is p(r)=1 which would maximise the AP.
So what is the code doing? It's computing the area under the precision-recall curve using the trapezoidal rule, see this equation on Wikipedia and you'll see it's identical to the code. The AP computation for the discrete case from your Wikipedia article is a (commonly used) worse approximation to the area under the precision-recall curve, the rectangle method.
| 归档时间: |
|
| 查看次数: |
798 次 |
| 最近记录: |