ElasticSearch无痛脚本:如何迭代嵌套对象数组

chr*_*abo 13 arrays elasticsearch

我试图创建一个使用脚本script_scorefunction_score.我有几个文件,其rankings字段是type="nested".该字段的映射是:

"rankings": {
        "type": "nested",
        "properties": {
          "rank1": {
            "type": "long"
          },
          "rank2": {
            "type": "float"
          },
          "subject": {
            "type": "text"
          }
        }
      }
Run Code Online (Sandbox Code Playgroud)

示例文档是:

"rankings": [
{
    "rank1": 1051,
    "rank2": 78.5,
    "subject": "s1"
},
{
    "rank1": 45,
    "rank2": 34.7,
    "subject": "s2"
}]
Run Code Online (Sandbox Code Playgroud)

我想要实现的是迭代嵌套的排名对象.实际上,我需要使用ie for循环来查找特定内容subject并使用它rank1, rank2来计算某些内容.到目前为止,我使用类似的东西,但它似乎不起作用(抛出编译错误):

"function_score": {
"script_score": {
    "script": {
        "lang": "painless",
        "inline": 
                 "sum = 0;"
                 "for (item in doc['rankings_cug']) {"
                     "sum = sum + doc['rankings_cug.rank1'].value;"
                 "}"
         }
    }
}
Run Code Online (Sandbox Code Playgroud)

我也尝试了以下选项:

  1. for循环使用:而不是in:for (item:doc['rankings'])没有成功.
  2. for循环使用,in但试图迭代对象的特定元素,即rank1:for (item in doc['rankings.rank1'].values),实际编译但似乎它找到零长度数组rank1.

我已经读过该_source元素可以返回类似JSON的对象,但据我发现它在搜索查询中不受支持.

能否请您介绍一下如何继续这样做?

非常感谢.

Rah*_*hai 14

您可以通过访问_source params._source.这个将工作:

PUT /rankings/result/1?refresh
{
  "rankings": [
    {
      "rank1": 1051,
      "rank2": 78.5,
      "subject": "s1"
    },
    {
      "rank1": 45,
      "rank2": 34.7,
      "subject": "s2"
    }
  ]
}

POST rankings/_search

POST rankings/_search
{
  "query": {
    "match": {
      "_id": "1"
    }
  },
  "script_fields": {
    "script_score": {
      "script": {
        "lang": "painless",
        "inline": "double sum = 0.0; for (item in params._source.rankings) { sum += item.rank2; } return sum;"
      }
    }
  }
}

DELETE rankings
Run Code Online (Sandbox Code Playgroud)

  • @tricky `params` 用于迭代我们提供脚本的参数。`ctx` 用于迭代现有数据。您可以使用 `for ( item in ctx._source.rankings )` 访问它 (2认同)

小智 7

不幸的是,ElasticSearch脚本通常不支持以这种方式访问​​嵌套文档的能力(包括Painless).也许,考虑一种与您的映射不同的结构,如果您需要能够以这种方式迭代排序,则将排名存储在多值字段中.最终,嵌套数据将需要去标准化并放入父文档中,以便能够以此处描述的方式获得分数.


Pri*_*esh 6

对于数组中的嵌套对象,迭代这些项目并且它起作用了。以下是我在 elasticsearch 索引中的示例数据:

{
  "_index": "activity_index",
  "_type": "log",
  "_id": "AVjx0UTvgHp45Y_tQP6z",
  "_version": 4,
  "found": true,
  "_source": {
    "updated": "2016-12-11T22:56:13.548641",
    "task_log": [
      {
        "week_end_date": "2016-12-11",
        "log_hours": 16,
        "week_start_date": "2016-12-05"
      },
      {
        "week_start_date": "2016-03-21",
        "log_hours": 0,
        "week_end_date": "2016-03-27"
      },
      {
        "week_start_date": "2016-04-24",
        "log_hours": 0,
        "week_end_date": "2016-04-30"
      }
    ],
    "created": "2016-12-11T22:56:13.548635",
    "userid": 895,
    "misc": {

    },
    "current": false,
    "taskid": 1023829
  }
}
Run Code Online (Sandbox Code Playgroud)

这是迭代嵌套对象的“无痛”脚本:

{
  "script": {
    "lang": "painless",
    "inline": 
        "boolean contains(def x, def y) {
          for (item in x) {
            if (item['week_start_date'] == y){
              return true
            }
          }
          return false 
         }
         if(!contains(ctx._source.task_log, params.start_time_param) {
           ctx._source.task_log.add(params.week_object)
         }",
         "params": {
            "start_time_param": "2016-04-24",
             "week_object": {
               "week_start_date": "2016-04-24",
               "week_end_date": "2016-04-30",
               "log_hours": 0
              }
          }
  }
}
Run Code Online (Sandbox Code Playgroud)

使用上面的脚本进行更新:/activity_index/log/AVjx0UTvgHp45Y_tQP6z/_update 在脚本中,创建了一个名为“contains”的函数,带有两个参数。调用了函数。旧的 groovy 样式: ctx._source.task_log.contains() 将不起作用,因为 ES 5.X 将嵌套对象存储在单独的文档中。希望这有帮助!`