Elasticsearch 基于嵌套对象的过滤和计数操作

Question

Elasticsearch 基于嵌套对象的过滤和计数操作

我是弹性搜索的新手。尝试将其用于分析计算。我不知道，是否可以这样做，但是，我正在尝试寻找购买次数为 0 的客户。我将订单存储为每个客户的嵌套对象数组。在这里，您可能会找到客户索引的示例映射属性：

"first_name" => [
 "type" => "text"
],
"last_name" => [
    "type"=> "text"
],
"email" => [
    "type"=> "text"
],
"total_spent" => [
    "type"=> "text"
],
"aov" => [
    "type"=> "float"
],
"orders_count" => [
    "type"=> "integer"
],
"orders" => [
    "type" => "nested",
    "properties" => [
        "order_id" => [
            "type"=>"text"
        ],
        "total_price" => [
            "type"=>"float"
        ]
    ]
]

Run Code Online (Sandbox Code Playgroud)

示例客户索引：

    [
   {
      "_index":"customers_index",
      "_type":"_doc",
      "_id":"1",
      "_score":1,
      "_source":{
         "first_name":"Stephen",
         "last_name":"Long",
         "email":"egnition_sample_91@egnition.com",
         "total_spent":"0.00",
         "aov":0,
         "orders":[]
      }
   },
   {
      "_index":"customers_index",
      "_type":"_doc",
      "_id":"2",
      "_score":1,
      "_source":{
         "first_name":"Reece",
         "last_name":"Dixon",
         "email":"egnition_sample_57@egnition.com",
         "total_spent":"0.10",
         "aov":"0.1",
         "orders":[
            {
               "total_price":"0.10",
               "placed_at":"2020-09-24T20:08:35.000000Z",
               "order_id":2723671867546
            }
         ]
      }
   },
   {
      "_index":"customers_index",
      "_type":"_doc",
      "_id":"3",
      "_score":1,
      "_source":{
         "first_name":"John",
         "last_name":"Marshall",
         "email":"egnition_sample_94@egnition.com",
         "total_spent":"0.10",
         "aov":"0.04",
         "orders":[
            {
               "total_price":"0.10",
               "placed_at":"2020-09-24T20:10:52.000000Z",
               "order_id":2723675930778
            },
            {
               "total_price":"0.30",
               "placed_at":"2020-09-24T20:09:45.000000Z",
               "order_id":2723673899162
            },
            {
               "total_price":"0.10",
               "placed_at":"2020-09-16T09:55:22.000000Z",
               "order_id":2704717414554
            }
         ]
      }
   }
]

Run Code Online (Sandbox Code Playgroud)

首先，我想问一下，您认为这种映射是否符合弹性搜索的性质？例如，我可以按特定日期范围对客户进行分组，并将 total_spent 总和作为聚合数据。但是，我想了解的是，是否可以通过特定日期范围的过滤嵌套订单数组找到没有订单的客户？你认为，这种查询，有一些性能问题吗？

我对 nosql 数据库不熟悉。我是一个 RDBMS 人。因此，我试图将 Elastic Search 的概念理解为分析数据库。

感谢您的回复

编辑：

我正在尝试计算对象之间指定日期范围的过滤器内的嵌套对象。在elasticsearch上这样做是否可能并且有意义？简单地说，我想查看在指定日期内输入的具有 1 个订单或多个订单的客户。

我知道如何获取每日客户数，但是如果我想计算在一组日报表中在指定日期范围内有 1 个订单的客户怎么办？

我预期的可能响应：

{
...
"aggregations":[
{
"date":"2020-09-01",
"total_customers_zero_purchased":15
}
...
]
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

Joe*_*ook 4

这里提出了很多问题，所以我将重点讨论最重要的部分。

首先，通常会创建某些类型的文本字段.keyword，以便我们稍后可以对其进行聚合。这意味着：

PUT customers_index
{
  "mappings": {
    "properties": {
      "email": {
        "type": "keyword"    <--
      }
    }
  }
}

Run Code Online (Sandbox Code Playgroud)

之后我们可以继续查询，但必须注意，当我们迭代日期范围时，我们需要指定一个日期字段。意义：

迭代范围是根据可用/当前值自动构建的（我们可以filter限制其范围）
而且，当文档不包含给定范围内的日期时，可以理解的是，它将被跳过。

实际上，我们无法获得每日滚动聚合（因为我们不知道我们不知道什么），而只能获得单日指标。例如

GET customers_index/_search
{
  "size": 0,
  "aggs": {
    "multibucket_simulator": {
      "filters": {
        "filters": {
          "all": {
            "match_all": {}
          }
        }
      },
      "aggs": {
        "all_customers": {
          "cardinality": {
            "field": "email"
          }
        },
        "customers_who_purchased_at_date": {
          "filter": {
            "nested": {
              "path": "orders",
              "query": {
                "range": {
                  "orders.placed_at": {
                    "gte": "2020-09-16T00:00:00.000000Z",
                    "lt": "2020-09-26T00:00:00.000000Z"
                  }
                }
              }
            }
          },
          "aggs": {
            "customer_count": {
              "cardinality": {
                "field": "email"
              }
            }
          }
        },
        "total_customers_zero_purchased": {
          "bucket_script": {
            "buckets_path": {
              "all": "all_customers.value",
              "filtered": "customers_who_purchased_at_date>customer_count.value"
            },
            "script": "params.all - params.filtered"
          }
        }
      }
    }
  }
}

Run Code Online (Sandbox Code Playgroud)

屈服

"aggregations" : {
  "multibucket_simulator" : {
    ...
    "buckets" : {
      "all" : {
        ...
        "customers_who_purchased_at_date" : {
          ...
        },
        "all_customers" : {
          ...
        },
        "total_customers_zero_purchased" : {       <---
          "value" : 1.0
        }
      }
    }
  }
}

Run Code Online (Sandbox Code Playgroud)

从而回答这个问题：

有多少客户在 09/16 至 09/25（含）期间未购买任何商品？

归档时间：	5 年，3 月前
查看次数：	273 次
最近记录：	5 年，3 月前