了解elasticsearch查询解释

use*_*291 1 explain elasticsearch

我试图理解弹性文档中的Explain API评分: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html

当我无法通过自己的简单索引仅使用几个文档来计算出结果时,我尝试在上述文档页面上重现计算。

在示例中,它显示的“值”为 1.3862944,其描述为:“idf,计算公式为 log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))”。在“详细信息”下,它给出了以下字段值:docFreq:1.0,docCount:5.0

使用提供的 docFreq 和 docCount 值,我将其计算为: log(1 + (5.0 - 1.0 + 0.5) / (1.0 + 0.5)) = 0.602,这与示例中的 1.3862944 不同。

我无法获得任何匹配的值。

我读错了吗?

以下是整个帖子

GET /twitter/_doc/0/_explain   
{ 
  "query" : {
    "match" : { "message" : "elasticsearch" }
  }
}
Run Code Online (Sandbox Code Playgroud)

这将产生以下结果:

{
   "_index": "twitter",
   "_type": "_doc",
   "_id": "0",
   "matched": true,
   "explanation": {
       "value": 1.6943599,
       "description": "weight(message:elasticsearch in 0) [PerFieldSimilarity], result of:",
       "details": [
       {
        "value": 1.6943599,
        "description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
        "details": [
           {
              "value": 1.3862944,  <== This is the one I am trying
              "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
              "details": [
                 {
                    "value": 1.0,
                    "description": "docFreq",
                    "details": []
                 },
                 {
                    "value": 5.0,
                    "description": "docCount",
                    "details": []
                  }
               ]
           },
            {
              "value": 1.2222223,
              "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
              "details": [
                 {
                    "value": 1.0,
                    "description": "termFreq=1.0",
                    "details": []
                 },
                 {
                    "value": 1.2,
                    "description": "parameter k1",
                    "details": []
                 },
                 {
                    "value": 0.75,
                    "description": "parameter b",
                    "details": []
                 },
                 {
                    "value": 5.4,
                    "description": "avgFieldLength",
                    "details": []
                 },
                 {
                    "value": 3.0,
                    "description": "fieldLength",
                    "details": []
                 }
              ]
           }
        ]
     }
  ]
}
}
Run Code Online (Sandbox Code Playgroud)

Mys*_*ion 5

解释一如既往地非常准确,让我帮助您理解这些计算:

这是最初的公式:

log(1 + (5.0 - 1.0 + 0.5) / (1.0 + 0.5))
Run Code Online (Sandbox Code Playgroud)

下一步是:

log(1 + 4.5 / 1.5)
Run Code Online (Sandbox Code Playgroud)

多一个:

log(4) = ?
Run Code Online (Sandbox Code Playgroud)

棘手的部分来了。您将其log视为以 10 为底的对数。但是,如果您查看 Lucene Scorer 的代码,您会发现它是一个ln,这正是1.386294

部分代码:

public float idf(long docFreq, long numDocs) {
    return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0);
  }
Run Code Online (Sandbox Code Playgroud)

其中Math.log定义如下:

public static double log(double a)

Returns the natural logarithm (base e) of a double value.
Run Code Online (Sandbox Code Playgroud)