Elasticsearch-每个文档的匹配计数

swl*_*ton 8 elasticsearch

我正在使用此查询来搜索字段中出现的短语。

"query": {
    "match_phrase": {
       "content": "my test phrase"
  }
 }
Run Code Online (Sandbox Code Playgroud)

我需要计算每个文档中每个短语的匹配次数(如果可能的话)?

我曾经考虑过聚合器,但认为这些聚合器不符合要求,因为它们会使我在整个索引中而不是每个文档中获得匹配的数量。

谢谢。

Pol*_*ton 5

这可以通过使用脚本字段 / painless脚本来实现。

您可以计算每个字段的出现次数,并将其累加到文档中。

例:

## Here's my test index with some sample values

POST t1/doc/1  <-- this has one occurence
{
  "content" : "my test phrase"
}

POST t1/doc/2    <-- this document has 5 occurences
{
   "content": "my test phrase ",
   "content1" : "this is my test phrase 1",
   "content2" : "this is my test phrase 2",
   "content3" : "this is my test phrase 3",
   "content4" : "this is my test phrase 4"

}

POST t1/doc/3
{
  "content" : "my test new phrase"
}
Run Code Online (Sandbox Code Playgroud)

现在使用脚本,我可以计算每个字段的短语匹配。我为每个字段计数一次,但是您可以将脚本修改为每个字段多个匹配项。

显然,这里的缺点是您需要在脚本中提及文档中的每个字段,除非有一种我不知道的遍历doc字段的方法。

POST t1/_search
{
  "script_fields": {
    "phrase_Count": {
      "script": {
        "lang": "painless",
        "source": """
                             int count = 0;

                            if(doc['content.keyword'].size() > 0 && doc['content.keyword'].value.indexOf(params.phrase)!=-1) count++;
                            if(doc['content1.keyword'].size() > 0 && doc['content1.keyword'].value.indexOf(params.phrase)!=-1) count++;
                            if(doc['content2.keyword'].size() > 0 && doc['content2.keyword'].value.indexOf(params.phrase)!=-1) count++;
                            if(doc['content3.keyword'].size() > 0 && doc['content3.keyword'].value.indexOf(params.phrase)!=-1) count++;
                            if(doc['content4.keyword'].size() > 0 && doc['content4.keyword'].value.indexOf(params.phrase)!=-1) count++;

                            return count;
""",
        "params": {
          "phrase": "my test phrase"
        }
      }
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

这将使我将每个文档的短语计数作为脚本字段

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "t1",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 1.0,
        "fields" : {
          "phrase_Count" : [
            5                 <--- count of occurrences of the phrase in the document
          ]
        }
      },
      {
        "_index" : "t1",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 1.0,
        "fields" : {
          "phrase_Count" : [
            1
          ]
        }
      },
      {
        "_index" : "t1",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 1.0,
        "fields" : {
          "phrase_Count" : [
            0
          ]
        }
      }
    ]
  }
}
Run Code Online (Sandbox Code Playgroud)


小智 -1

您可以使用术语向量来实现此功能。请看一下 术语向量