Elasticsearch不会先返回完全匹配

Con*_*scu 6 elasticsearch

我有一个弹性搜索索引,其中有一个完全匹配的字段,不知怎的,我得到了很多类似的结果(我不介意)和那些类似的结果在完全匹配之前排序,(我记得.)

有人可以解释发生了什么以及如何解决它?

我的映射是这样的

"exact":{
  "type":"string",
  "boost":10.0,
  "analyzer":"keyword"
},
Run Code Online (Sandbox Code Playgroud)

我搜索"AAPL P JAN 2014 885,00"的查询是这样的:

{
  "size" : 21,
  "query" : {
    "field" : {
      "exact" : "AAPL P JAN 2014 885,00"
    }
  },
  "explain" : true,
  "sort" : [ {
    "_score" : {
      "order" : "desc"
    }
  } ],
  "facets" : {
    "category" : {
      "terms" : {
        "field" : "category",
        "size" : 10
      }
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

并且返回的文档按此顺序结束:

  • {"exact":["APPLE INC","US0378331005","AAPL","73773"],"id-compound":"AAPL"}
  • {"exact":["AAPL","73773","AAPL P JAN 2014 675,00"],"id-compound":"AAPL*PUT*20140118*675"}
  • {"exact":["AAPL","73773","AAPL C JAN 2014 500,00"],"id-compound":"AAPL*CALL*20140118*500"}

等等,完全匹配一堆结果.

有人可以向我解释为什么完全匹配不会在最后?

如果它有助于理解事物,那么搜索结果的完整解释如下.

"hits" : [ {
  "_shard" : 0,
  "_node" : "1",
  "_index" : "instruments",
  "_type" : "instrument",
  "_id" : "AAPL",
  "_score" : 1306.8339, "_source" : {"exact":["APPLE INC","US0378331005","AAPL","73773"],"id-compound":"AAPL"},
  "_explanation" : {
    "value" : 1306.8339,
    "description" : "product of:",
    "details" : [ {
      "value" : 6534.169,
      "description" : "sum of:",
      "details" : [ {
        "value" : 6534.169,
        "description" : "weight(exact:AAPL in 9096), product of:",
        "details" : [ {
          "value" : 0.25854474,
          "description" : "queryWeight(exact:AAPL), product of:",
          "details" : [ {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 0.0419026,
            "description" : "queryNorm"
          } ]
        }, {
          "value" : 25272.875,
          "description" : "fieldWeight(exact:AAPL in 9096), product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(termFreq(exact:AAPL)=1)"
          }, {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 4096.0,
            "description" : "fieldNorm(field=exact, doc=9096)"
          } ]
        } ]
      } ]
    }, {
      "value" : 0.2,
      "description" : "coord(1/5)"
    } ]
  }
}, {
  "_shard" : 0,
  "_node" : "1",
  "_index" : "instruments",
  "_type" : "instrument",
  "_id" : "AAPL*PUT*20140118*675",
  "_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL P JAN 2014 675,00"],"id-compound":"AAPL*PUT*20140118*675"},
  "_explanation" : {
    "value" : 163.35423,
    "description" : "product of:",
    "details" : [ {
      "value" : 816.7711,
      "description" : "sum of:",
      "details" : [ {
        "value" : 816.7711,
        "description" : "weight(exact:AAPL in 18), product of:",
        "details" : [ {
          "value" : 0.25854474,
          "description" : "queryWeight(exact:AAPL), product of:",
          "details" : [ {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 0.0419026,
            "description" : "queryNorm"
          } ]
        }, {
          "value" : 3159.1094,
          "description" : "fieldWeight(exact:AAPL in 18), product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(termFreq(exact:AAPL)=1)"
          }, {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 512.0,
            "description" : "fieldNorm(field=exact, doc=18)"
          } ]
        } ]
      } ]
    }, {
      "value" : 0.2,
      "description" : "coord(1/5)"
    } ]
  }
}, {
  "_shard" : 0,
  "_node" : "1",
  "_index" : "instruments",
  "_type" : "instrument",
  "_id" : "AAPL*CALL*20140118*500",
  "_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL C JAN 2014 500,00"],"id-compound":"AAPL*CALL*20140118*500"},
  "_explanation" : {
    "value" : 163.35423,
    "description" : "product of:",
    "details" : [ {
      "value" : 816.7711,
      "description" : "sum of:",
      "details" : [ {
        "value" : 816.7711,
        "description" : "weight(exact:AAPL in 383), product of:",
        "details" : [ {
          "value" : 0.25854474,
          "description" : "queryWeight(exact:AAPL), product of:",
          "details" : [ {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 0.0419026,
            "description" : "queryNorm"
          } ]
        }, {
          "value" : 3159.1094,
          "description" : "fieldWeight(exact:AAPL in 383), product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(termFreq(exact:AAPL)=1)"
          }, {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 512.0,
            "description" : "fieldNorm(field=exact, doc=383)"
          } ]
        } ]
      } ]
    }, {
      "value" : 0.2,
      "description" : "coord(1/5)"
    } ]
  }
}, {
  "_id" : "AAPL*PUT*20140118*940",
  "_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL P JAN 2014 940,00"],"id-compound":"AAPL*PUT*20140118*940"},
  "_explanation" : {
    "value" : 163.35423,
    "description" : "product of:",
    "details" : [ {
      "value" : 816.7711,
      "description" : "sum of:",
      "details" : [ {
        "value" : 816.7711,
        "description" : "weight(exact:AAPL in 794), product of:",
        "details" : [ {
          "value" : 0.25854474,
          "description" : "queryWeight(exact:AAPL), product of:",
          "details" : [ {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 0.0419026,
            "description" : "queryNorm"
          } ]
        }, {
          "value" : 3159.1094,
          "description" : "fieldWeight(exact:AAPL in 794), product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(termFreq(exact:AAPL)=1)"
          }, {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 512.0,
            "description" : "fieldNorm(field=exact, doc=794)"
          } ]
        } ]
      } ]
    }, {
      "value" : 0.2,
      "description" : "coord(1/5)"
    } ]
  }
}
Run Code Online (Sandbox Code Playgroud)

如果我分析我试图存储的数据会发生什么:

curl -XGET 'localhost:9200/instruments/_analyze?field=exact&pretty=true' -d 'ING  P JUN 2013 6.00'
{
  "tokens" : [ {
    "token" : "ING  P JUN 2013 6.00",
    "start_offset" : 0,
    "end_offset" : 20,
    "type" : "word",
    "position" : 1
  } ]
Run Code Online (Sandbox Code Playgroud)

jav*_*nna 0

所有三个文档都获得完全相同的分数,正如您从解释输出中看到的那样,它们都与“AAPL”匹配。该术语在文档中始终出现一次 (tf=1),并且出现在 37299 个文档中的 211 个文档中 (idf=6.1701355)。由于您使用的是索引时间提升(映射中的提升部分,10),因此字段范数要高得多,无论如何,这没什么大不了的,因为匹配始终在同一字段上。只是,如果您在其他领域进行比赛,那么几乎总是会获胜,这对您来说可能是有意义的。

但问题是,AAPL P JAN 2014 885,00如果我查看你的文档,这并不完全匹配。我所看到的是,您查询中的 5 个术语中只有一个匹配,这也由您的解释输出中的坐标确认:coord(1/5)`。

分析器keyword似乎已应用,但正如您从返回的文档中看到的那样,您没有将字段的内容exact作为单个值发送,而是作为值数组发送。它的每个项目都不会被标记化,因为您正在使用keyword分析器,但您仍然有多个标记。我想您必须检查如何索引文档。