elasticsearch中对应的SQL聚合查询

Are*_*Tam 1 aggregate-functions elasticsearch elasticsearch-aggregation elasticsearch-sql

我研究了elasticsearch聚合查询,但找不到它是否支持多个聚合函数。换句话说,我想知道elasticsearch是否可以生成与此Sql聚合查询等效的内容:

  SELECT account_no, transaction_type, count(account_no), sum(amount), max(amount) FROM index_name GROUP BY account_no, transaction_type Having count(account_no) > 10
Run Code Online (Sandbox Code Playgroud)

如果是,怎么办?谢谢。

Kam*_*mal 5

两种可能的方法可以实现您在 ES 中寻找的功能,我在下面提到了它们。

我还添加了示例映射和示例文档供您参考。

映射:

PUT index_name
{
  "mappings": {
    "mydocs":{
      "properties":{
        "account_no":{
          "type": "keyword"
        },
        "transaction_type":{
          "type": "keyword"
        },
        "amount":{
          "type":"double"
        }
      }
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

样本文件:

请仔细注意,我仅为 1 位客户创建 4 笔交易的列表。

POST index_name/mydocs/1
{
  "account_no": "1011",
  "transaction_type":"credit",
  "amount": 200
}

POST index_name/mydocs/2
{
  "account_no": "1011",
  "transaction_type":"credit",
  "amount": 400
}

POST index_name/mydocs/3
{
  "account_no": "1011",
  "transaction_type":"cheque",
  "amount": 100
}

POST index_name/mydocs/4
{
  "account_no": "1011",
  "transaction_type":"cheque",
  "amount": 100
}
Run Code Online (Sandbox Code Playgroud)

有两种方法可以获得您想要的东西:

解决方案 1:使用 Elasticsearch 查询 DSL

聚合查询:

对于聚合查询 DSL,我使用以下聚合查询来解决您正在寻找的问题。

下面是查询如何总结查询版本的方式,以便您清楚哪些查询是同级查询,哪些查询是父级查询

- Terms Aggregation (For Every Account)
  - Terms Aggregation (For Every Transaction_type)
    - Sum Amount 
    - Max Amount
Run Code Online (Sandbox Code Playgroud)

下面是实际的查询:

POST index_name/_search
{
  "size": 0, 
  "aggs": {
    "account_no_agg": {
      "terms": {
        "field": "account_no"
      },
      "aggs": {
        "transaction_type_agg": {
          "terms": {
            "field": "transaction_type",
            "min_doc_count": 2
          },
          "aggs": {
            "sum_amount": {
              "sum": {
                "field": "amount"
              }
            },
            "max_amount":{
              "max": {
                "field": "amount"
              }
            }
          }
        }
      }
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

需要提到的重要一点是,min_doc_count它只不过是having count(account_no)>10,在我的查询中我只过滤那些交易having count(account_no) > 2

查询响应

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "account_no_agg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1011",                         <----  account_no
          "doc_count" : 4,                        <----  count(account_no)
          "transaction_type_agg" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "cheque",                 <---- transaction_type
                "doc_count" : 2,
                "sum_amount" : {                  <----  sum(amount)
                  "value" : 200.0
                },
                "max_amount" : {                  <----  max(amount)
                  "value" : 100.0
                }
              },
              {
                "key" : "credit",                 <---- another transaction_type
                "doc_count" : 2,
                "sum_amount" : {                  <---- sum(amount)
                  "value" : 600.0
                },
                "max_amount" : {                  <---- max(amount)
                  "value" : 400.0
                }
              }
            ]
          }
        }
      ]
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

仔细注意上面的结果,我在需要的地方添加了注释,以便它可以帮助您查找 sql 查询的哪一部分。

方案二:使用Elasticsearch SQL(_xpack方案)

如果您正在使用 Elasticsearch 的 SQL Access 的 xpack 功能,您可以简单地复制粘贴如下所示的SELECT 查询来获取上述映射和文档:

弹性搜索 SQL:

POST /_xpack/sql?format=txt
{
  "query": "SELECT account_no, transaction_type, sum(amount), max(amount), count(account_no) FROM index_name GROUP BY account_no, transaction_type HAVING count(account_no) > 1"

}
Run Code Online (Sandbox Code Playgroud)

Elasticsearch SQL 结果:

  account_no   |transaction_type|  SUM(amount)  |  MAX(amount)  |COUNT(account_no)
---------------+----------------+---------------+---------------+-----------------
1011           |cheque          |200.0          |100.0          |2                
1011           |credit          |600.0          |400.0          |2                
Run Code Online (Sandbox Code Playgroud)

请注意,我已经在 ES 6.5.4 中测试了该查询。

希望这可以帮助!