02.elasticsearch bucket aggregation查询

2024-08-31 15:48

本文主要是介绍02.elasticsearch bucket aggregation查询,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

文章目录

    • 1. bucket aggregation 查询类型概览
    • 2. 数据准备
    • 3. 使用样例
      • 1. Terms Aggregation:
        • 1. 普通的terms agg
        • 2. 嵌套一个metric agg 作为sub agg查询
        • 3. 嵌套一个terms agg作为sub agg查询
      • 2. Range Aggregation:
      • 3. Date Histogram Aggregation:
      • 4. Date Range Aggregation
      • 5. Filter Aggregation
      • 6. Filters Aggregation
      • 7. Histogram Aggregation
      • 8. Missing Aggregation: 统计某个field不存在的doc
      • 9. nested aggs:用于nested的doc的聚合查询,一般是再有一个子查询来统计
      • 10. child agg 查询,针对join类型的数据进行查询
      • 11. parent agg 查询,针对join类型的数据进行查询
      • 12. Composite Aggregation 多个维度的terms进行组合操作,类似多层terms的嵌套,但是结果不是嵌套的,和mysql中按照多个字段进行group by类似
      • 13. Adjacency Matrix Aggregation,邻接矩阵聚合
      • 14. global agg 查询,针对所有数据的查询
      • 15. Significant Terms Aggregation: 自动查找显著性的关键字
      • 16. Significant Text Aggregation: 自动查找显著性的关键字
      • 17. Sampler Aggregation: 抽样数据聚合
      • 18.Reverse nested Aggregation 在nested agg中仍然可以对parent 的数据进行统计

elasticsearch的aggregate查询现在越来越丰富了,目前总共有4类。

  1. metric aggregation: 主要是min,max,avg,sum,percetile 等单个统计指标的查询
  2. bucket aggregation: 主要是类似group by的查询操作
  3. matrix aggregation: 使用多个字段的值进行计算从而产生一个多维矩阵
  4. pipline aggregation: 主要是能够在其他的aggregation进行一些附加的处理来增强数据

本篇就主要学习bucket aggregation,bucket aggregation查询类似group by 查询,而且相对metric aggregation 查询来说,bucket agg可以有sub aggregation, 也就是可以进行嵌套,嵌套的sub agg可以是bucket agg也可以是 metric agg。

1. bucket aggregation 查询类型概览

Terms Aggregation: 典型的grop by 类型,按照某个field将文档进行分桶,如果该field的value是数组的话,则该文档会被统计到多个bucket当中
Range Aggregation: 一般是针对number field,指定多个范围进行bucket划分
Date Histogram Aggregation: 按照时间进行分bucket,自动按照月等进行划分
Date Range Aggregation: 按照时间范围进行bucket,类似range aggregation
Filter Aggregation: 就是一个简单的过滤器,和query中的filter功能类似
Filters Aggregation: 多个filter进行过滤
Histogram Aggregation: 柱状图的聚合

Missing Aggregation: 统计某个field不存在的doc
Adjacency Matrix Aggregation
Auto-interval Date Histogram Aggregation
Children Aggregation
Composite Aggregation
Diversified Sampler Aggregation
Geo Distance Aggregation
GeoHash grid Aggregation
GeoTile Grid Aggregation
Global Aggregation
IP Range Aggregation
Nested Aggregation
Parent Aggregation
Reverse nested Aggregation
Sampler Aggregation
Significant Terms Aggregation
Significant Text Aggregation

2. 数据准备

演唱会的票信息
GET seats1028/_search

{
"play" : "Auntie Jo",   # 演唱会名称
"date" : "2018-11-6",  # 时间
"theatre" : "Skyline",  # 地点
"sold" : false,      # 这个票是否已经卖出
"actors" : [         # 演员"Jo Hangum","Jon Hittle","Rob Kettleman","Laura Conrad","Simon Hower","Nora Blue"],
"datetime" : 1541497200000,
"price" : 8321,    # 票价
"tip" : 17.5,      # 优惠
"time" : "5:40PM"
}

总共有3w+条这样的数据

3. 使用样例

1. Terms Aggregation:

典型的grop by 类型,按照某个field将文档进行分桶,如果该field的value是数组的话,则该文档会被统计到多个bucket当中

1. 普通的terms agg
GET seats1028/_search
{"size": 0,"aggs": {"term_price":{"terms": {"field": "price","min_doc_count": 13,"size": 50}}}
}返回
"aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 35384,"buckets" : [{"key" : 910,"doc_count" : 13},{"key" : 3273,"doc_count" : 13},{"key" : 3648,"doc_count" : 13}]}}
2. 嵌套一个metric agg 作为sub agg查询

按照row进行分组,取doc数量最多的前3个bucket,并计算每个bucket中的price的最大值。


GET seats1028/_search
{"size": 0,"aggs": {"term_price":{"terms": {"field": "row","min_doc_count": 13,"size": 3,"order": {"_count": "desc"}},"aggs": {"max_price": {"max": {"field": "price"}}}}}
}返回"aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 13608,"buckets" : [{"key" : 2,"doc_count" : 5796,"max_price" : {"value" : 9998.0}},{"key" : 3,"doc_count" : 5796,"max_price" : {"value" : 9999.0}},{"key" : 1,"doc_count" : 5791,"max_price" : {"value" : 9999.0}}]}}
3. 嵌套一个terms agg作为sub agg查询

先按照row进行bucket划分,给出doc数量前3的row对应的bucket,然后每个bucket按照number进行再分bucket, 并给出doc数量前三的number值对应的bucket。

GET seats1028/_search
{"size": 0,"aggs": {"term_price":{"terms": {"field": "row","min_doc_count": 13,"size": 3,"order": {"_count": "desc"}},"aggs": {"number_term": {"terms": {"field": "number","size": 3,"order": {"_count": "desc"}}}}}}
}返回
"aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 13608,"buckets" : [{"key" : 2,"doc_count" : 5796,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4368,"buckets" : [{"key" : 1,"doc_count" : 476},{"key" : 2,"doc_count" : 476},{"key" : 3,"doc_count" : 476}]}},{"key" : 3,"doc_count" : 5796,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4368,"buckets" : [{"key" : 1,"doc_count" : 476},{"key" : 2,"doc_count" : 476},{"key" : 3,"doc_count" : 476}]}},{"key" : 1,"doc_count" : 5791,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4363,"buckets" : [{"key" : 5,"doc_count" : 476},{"key" : 6,"doc_count" : 476},{"key" : 7,"doc_count" : 476}]}}]}}

2. Range Aggregation:

一般是针对number field,指定多个范围进行bucket划分,包含from数值,不包含to对应的数值

GET seats1028/_search
{"size": 0,"aggs": {"price_range": {"range": {"field": "price","ranges": [{"from": 5000,"to": 6000}]}}}
}返回
"aggregations" : {"price_range" : {"buckets" : [{"key" : "5000.0-6000.0","from" : 5000.0,"to" : 6000.0,"doc_count" : 3646}]}}

3. Date Histogram Aggregation:

按照时间进行分bucket,自动按照月等进行划分

GET seats1028/_search
{"size": 0,"aggs": {"price_date_histogram": {"date_histogram": {"field": "datetime","calendar_interval": "month"}}}
}返回"aggregations" : {"price_date_histogram" : {"buckets" : [{"key_as_string" : "2018-03-01T00:00:00.000Z","key" : 1519862400000,"doc_count" : 2310},{"key_as_string" : "2018-04-01T00:00:00.000Z","key" : 1522540800000,"doc_count" : 3946},{"key_as_string" : "2018-05-01T00:00:00.000Z","key" : 1525132800000,"doc_count" : 3948},{"key_as_string" : "2018-06-01T00:00:00.000Z","key" : 1527811200000,"doc_count" : 3948},{"key_as_string" : "2018-07-01T00:00:00.000Z","key" : 1530403200000,"doc_count" : 3948}]}}

4. Date Range Aggregation

按照时间范围进行bucket,类似range aggregation

GET seats1028/_search
{"size": 0,"aggs": {"price_date_histogram": {"date_range": {"field": "datetime","ranges": [{"from": "2018-10-01T00:00:00.000Z","to": "2018-11-01T00:00:00.000Z"}]}}}
}返回"aggregations" : {"price_date_histogram" : {"buckets" : [{"key" : "2018-10-01T00:00:00.000Z-2018-11-01T00:00:00.000Z","from" : 1.538352E12,"from_as_string" : "2018-10-01T00:00:00.000Z","to" : 1.5410304E12,"to_as_string" : "2018-11-01T00:00:00.000Z","doc_count" : 3948}]}}

5. Filter Aggregation

就是一个简单的过滤器,和query中的filter功能类似

GET seats1028/_search
{"size": 0,"aggs": {"sold_filter": {"filter": {"range": {"tip": {"gte": 10,"lte": 20}}},"aggs": {"max_price": {"max": {"field": "price"}}}}}
}返回
"aggregations" : {"sold_filter" : {"doc_count" : 6300, # 这个是filter后的doc count"max_price" : {"value" : 9996.0}}}

6. Filters Aggregation

多个filter进行过滤, 对于每个filter过滤的结果再应用子agg查询

GET seats1028/_search
{"size": 0,"aggs": {"sold_filter": {"filters": {"filters": {    # 这个地方的用法还是挺怪异的,最终还是"tip_filter": {"range": {"tip": {"gte": 10,"lte": 20}}},"number_filter": {"range": {"number": {"gte": 5,"lte":10}}}}},"aggs": {"max_price": {"max": {"field": "price"}}}}}
}
返回"aggregations" : {"sold_filter" : {"buckets" : {"number_filter" : {"doc_count" : 16072,"max_price" : {"value" : 9999.0}},"tip_filter" : {  "doc_count" : 6300,"max_price" : {"value" : 9996.0}}}}}

可以看到这里对每一个子的filter都进行了过滤

7. Histogram Aggregation

柱状图的聚合,这里用来聚合的字段一般是数值型,比较方便用来分组

GET seats1028/_search
{"size": 0,"aggs": {"tip_histogram":{"histogram": {"field": "tip","interval": 4}}}
}返回"aggregations" : {"number_histogram" : {"buckets" : [{"key" : 16.0,"doc_count" : 4200},{"key" : 20.0,"doc_count" : 8400},{"key" : 24.0,"doc_count" : 17808},{"key" : 28.0,"doc_count" : 5794}]}}

8. Missing Aggregation: 统计某个field不存在的doc

GET seats1028/_search
{"size":0,"aggs": {"miss_f": {"missing": {"field": "row"}}}
}返回
"aggregations" : {"miss_f" : {"doc_count" : 1}}

9. nested aggs:用于nested的doc的聚合查询,一般是再有一个子查询来统计

数据样例
这个查询用于nested的doc的聚合查询,一般是再有一个子查询来统计
数据样例,班级里面有一个学生列表,学生有age,name属性

GET nest_test/_mapping
返回
{"mappings" : {"properties" : {"c_name" : {"type" : "text"},"class" : {"type" : "nested","properties" : {"students" : {"type" : "nested","properties" : {"age" : {"type" : "integer"},"name" : {"type" : "text"}}}}}}}}对应的文档有两个
"_source" : {"c_name" : "start_class","class" : {"students" : [{"name" : "jack chen","age" : 30},{"name" : "jack man","age" : 20},{"name" : "pony wang","age" : 60},{"name" : "gebi wang","age" : 90}]}}"_source" : {"c_name" : "sun_class","class" : {"students" : [{"name" : "lucy chen","age" : 30},{"name" : "lucy man","age" : 20},{"name" : "dong wang","age" : 60},{"name" : "chess wang","age" : 90}]}}

对应的查询


GET nest_test/_search
{"size": 0,"aggs": {"nested_agg": {"nested": {"path": "class.students"},"aggs": {"min_age": {"min": {"field": "class.students.age"}}}}}
}返回"aggregations" : {"nested_agg" : {"doc_count" : 8,"min_age" : {"value" : 20.0}}}

10. child agg 查询,针对join类型的数据进行查询

数据准备,每个教室(class_room)可以有多个课程(subject),每个学生(student)可以选择一个或者多个class_room,这样class_room和student就构成了parent/child的关系


PUT join_class
{"mappings": {"properties": {"subject":{"type": "keyword"},"class_student":{"type": "join","relations":{"class_room":"student"}}}}
}PUT join_class/_doc/1
{"subject":["english","Chinese","Russia"],"class_student":{"name":"class_room"},"des":"this class room teach english, Chinese, Russia"
}PUT join_class/_doc/2?routing=1
{"class_student":{"name":"student","parent":1},"name":"jack"
}PUT join_class/_doc/3?routing=1
{"class_student":{"name":"student","parent":1},"name":"pony"
}

下面这个查询要查找的是每个subject的对应的有哪些学生


GET join_class/_search
{"size":0,"query": {"match_all": {}},"aggs": {"subject_term": {"terms": {"field": "subject","size": 10},"aggs": {"subject_student": {"children": {"type": "student"},"aggs": {"term_name": {"terms": {"field": "name.keyword","size": 10}}}}}}}
}返回"aggregations" : {"subject_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}},{"key" : "Russia","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}},{"key" : "english","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}}]}}

11. parent agg 查询,针对join类型的数据进行查询

承接上面的数据样例,下面的请求查找每个学生选的课程


GET join_class/_search
{"size":0,"query": {"match_all": {}},"aggs": {"student_term": {"terms": {"field": "name.keyword","size": 10},"aggs": {"subject_student": {"parent": {"type": "student"},"aggs": {"choose_subject": {"terms": {"field": "subject","size": 10}}}}}}}
}

返回

 "aggregations" : {"student_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1,"subject_student" : {"doc_count" : 1,"choose_subject" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1},{"key" : "Russia","doc_count" : 1},{"key" : "english","doc_count" : 1}]}}},{"key" : "pony","doc_count" : 1,"subject_student" : {"doc_count" : 1,"choose_subject" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1},{"key" : "Russia","doc_count" : 1},{"key" : "english","doc_count" : 1}]}}}]}}

12. Composite Aggregation 多个维度的terms进行组合操作,类似多层terms的嵌套,但是结果不是嵌套的,和mysql中按照多个字段进行group by类似

数据初始化


PUT composite_test
{"mappings": {"properties": {"area": {"type": "keyword"},"userid": {"type": "keyword"},"sendtime": {"type": "date","format": "yyyy-MM-dd HH:mm:ss"}}}
}
POST composite_test/_bulk
{ "index" : {"_type" :"_doc"}}
{"area":"33","userid":"400015","sendtime":"2019-01-17 00:00:00"}
{ "index" : {"_type" : "_doc"}}
{"area":"33","userid":"400015","sendtime":"2019-01-17 00:00:00"}
{ "index" : {"_type" : "_doc"}}
{"area":"35","userid":"400016","sendtime":"2019-01-18 00:00:00"}
{ "index" : { "_type" : "_doc"}}
{"area":"35","userid":"400016","sendtime":"2019-01-18 00:00:00"}
{ "index" : {"_type" : "_doc"}}
{"area":"33","userid":"400017","sendtime":"2019-01-17 00:00:00"}

下面的查询会按照area,userid, sendtime 三个字段进行group by查询


GET composite_test/_search
{"size": 0,"aggs": {"my_buckets": {"composite": {"sources": [{"area": {"terms": {"field": "area"}}},{"userid": {"terms": {"field": "userid"}}},{"sendtime": {"date_histogram": {"field": "sendtime","fixed_interval": "1d","format": "yyyy-MM-dd"}}}]}}}
}

返回

"aggregations" : {"my_buckets" : {"after_key" : {"area" : "35","userid" : "400016","sendtime" : "2019-01-18"},"buckets" : [{"key" : {"area" : "33","userid" : "400015","sendtime" : "2019-01-17"},"doc_count" : 2},{"key" : {"area" : "33","userid" : "400017","sendtime" : "2019-01-17"},"doc_count" : 1},{"key" : {"area" : "35","userid" : "400016","sendtime" : "2019-01-18"},"doc_count" : 2}]}}

13. Adjacency Matrix Aggregation,邻接矩阵聚合

邻接矩阵聚合,上面的composition是多个维度的terms求交,这个更弱一些,只能做指定的field的某些值进行邻接矩阵生成
使用上面的数据样例,下面的查询会返回area=33的doc统计,userid=400015的doc统计,同时还会返回area=33 & userid=400015的doc统计


GET composite_test/_search
{"size": 0,"aggs": {"composite_two": {"adjacency_matrix": {"filters": {"area_filter":{"terms":{"area":["33"]}},"user_id_filter":{"terms":{"userid":["400015"]}}}}}}

返回

"aggregations" : {"composite_two" : {"buckets" : [{"key" : "area_filter","doc_count" : 3},{"key" : "area_filter&user_id_filter","doc_count" : 2},{"key" : "user_id_filter","doc_count" : 2}]}}

14. global agg 查询,针对所有数据的查询

这个就是忽略query的过滤信息,直接针对index中的所有数据进行子聚合

GET seats1028/_search
{"size": 0, "query": {"term": {"row": {"value": 5}}},"aggs": {"global_row": {"global": {},"aggs": {"avg_row": {"avg": {"field": "row"}}}},"avg_row02":{"avg": {"field": "row"}}}
}

返回

"aggregations" : {"global_row" : {"doc_count" : 30992,"avg_row" : {"value" : 4.333871123874673   # 这个值是从所有的doc中算出来的}},"avg_row02" : {"value" : 5.0  # 这个是query过滤后的doc中计算出来的}}

15. Significant Terms Aggregation: 自动查找显著性的关键字

这个是在keyword的字段中查找当前的显著性的字段,查找出现频率比较高的字段
还是使用案例来说明更靠谱,这里举例的是网页新闻news,每个新闻news有作者(author) title, topic,等信息
相关数据构造如下

PUT news
{"mappings": {"properties": {"published": {"type": "date","format": "dateOptionalTime"},"author": {"type": "keyword"},"title": {"type": "text"},"topic": {"type": "keyword"},"views": {"type": "integer"}}}
}POST news/_bulk
{"index": {"_index": "news"}
}
{"author": "John Michael","published": "2018-07-08","title": "Tesla is flirting with its lowest close in over 1 1/2 years (TSLA)","topic": "automobile","views": "431"
}
{"index": {"_index": "news"}
}
{"author": "John Michael","published": "2018-07-22","title": "Tesla to end up like Lehman Brothers (TSLA)","topic": "automobile","views": "1921"
}
{"index": {"_index": "news"}
}
{"author": "John Michael","published": "2018-07-29","title": "Tesla (TSLA) official says that they are going to release a new self-driving car model in the coming year","topic": "automobile","views": "1849"
}
{"index": {"_index": "news"}
}
{"author": "John Michael","published": "2018-08-14","title": "Five ways Tesla uses AI and Big Data","topic": "ai","views": "871"
}
{"index": {"_index": "news"}
}
{"author": "John Michael","published": "2018-08-14","title": "Toyota partners with Tesla (TSLA) to improve the security of self-driving cars","topic": "automobile","views": "871"
}
{"index": {"_index": "news"}
}
{"author": "Robert Cann","published": "2018-08-25","title": "Is AI dangerous for humanity","topic": "ai","views": "981"
}
{"index": {"_index": "news"}
}
{"author": "Robert Cann","published": "2018-09-13","title": "Is AI dangerous for humanity","topic": "ai","views": "871"
}
{"index": {"_index": "news"}
}
{"author": "Robert Cann","published": "2018-09-27","title": "Introduction to Generative Adversarial Networks (GANs) in self-driving cars","topic": "automobile","views": "1183"
}
{"index": {"_index": "news"}
}
{"author": "Robert Cann","published": "2018-10-09","title": "Introduction to Natural Language Processing","topic": "ai","views": "786"
}
{"index": {"_index": "news"}
}
{"author": "Robert Cann","published": "2018-10-15","title": "New Distant Objects Found in the Fight for Planet X ","topic": "astronomy","views": "542"
}

查找每个作者关注最多的topic,那么该作者肯定在该topic的发问最多

GET news/_search
{"size": 0,"aggregations": {"authors": {"terms": {"field": "author"},"aggregations": {"significant_topic_types": {"significant_terms": {"field": "topic"}}}}}
}

返回

  "aggregations" : {"authors" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "John Michael","doc_count" : 5,"significant_topic_types" : {"doc_count" : 5,"bg_count" : 10,"buckets" : [{"key" : "automobile","doc_count" : 4,"score" : 0.4800000000000001,"bg_count" : 5}]}},{"key" : "Robert Cann","doc_count" : 5,"significant_topic_types" : {"doc_count" : 5,  # Robert Cann 总的doc数量为5个"bg_count" : 10,  # index中所有的doc数量为10"buckets" : [{"key" : "ai","doc_count" : 3,  # Robert Cann 的topic为ai的doc总共有3个"score" : 0.2999999999999999,"bg_count" : 4   ## 这里是指索引中topic是ai的文档总共有4个}]}}]}}

上面的统计说明John Michael 这位作者最关注的话题是 automobile(自动驾驶),而Robert Cann 最关注的是ai相关的话题,相关的bg_count的说明查看上面的注释

16. Significant Text Aggregation: 自动查找显著性的关键字

这个和上面的Significant terms Aggregation类似,就是针对的是text字段,而且会进行分词处理
使用上面的数据进行下面的查询


GET news/_search
{"query": {"match": {"title": " AI "}},"size": 0,"aggs": {"significant_title": {"significant_text": {"field": "title"}}}
}

返回

"aggregations" : {"significant_title" : {"doc_count" : 3,"bg_count" : 10,"buckets" : [{"key" : "ai","doc_count" : 3,"score" : 2.3333333333333335,"bg_count" : 3}]}}

17. Sampler Aggregation: 抽样数据聚合

这个一般是在significant_terms 查询的时候,有时候索引中的数据可能非常大,导致耗时也比较严重,可以用这个来做抽样聚合,抽取更相关的样本数据来进行聚合

POST /stackoverflow/_search?size=0
{"query": {"query_string": {"query": "tags:kibana OR tags:javascript"}},"aggs": {"sample": {"sampler": {"shard_size": 200},"aggs": {"keywords": {"significant_terms": {"field": "tags","exclude": ["kibana", "javascript"]}}}}}
}

shard_size 参数指的是每个分片抽取的样本数量,默认为 100
返回

{..."aggregations": {"sample": {"doc_count": 200,"keywords": {"doc_count": 200,"bg_count": 650,"buckets": [{"key": "elasticsearch","doc_count": 150,"score": 1.078125,"bg_count": 200},{"key": "logstash","doc_count": 50,"score": 0.5625,"bg_count": 50}]}}}
}

18.Reverse nested Aggregation 在nested agg中仍然可以对parent 的数据进行统计

Reverse nested Aggregation 的作用主要是能够让聚合在作为 Nested Aggregation 子聚合的情况下,跳出嵌套类型,对根文档的数据作聚合计算。
有例子:

PUT /issues
{"mappings": {"properties" : {"tags" : { "type" : "keyword" },"comments" : { "type" : "nested","properties" : {"username" : { "type" : "keyword" },"comment" : { "type" : "text" }}}}}
}PUT issues/_doc/1
{"tags": ["bug","improve"],"comments": [{"username": "jack","comment": " this is a bug"},{"username": "pony","comment": " this is a improve"}]
}PUT issues/_doc/2
{"tags": ["advice","improve"],"comments": [{"username": "jack","comment": " this is a good job "},{"username": "nacy","comment": " this is a improvement"}]
}

查询

GET /issues/_search
{"size": 0,"query": {"match_all": {}},"aggs": {"comments": {"nested": {"path": "comments"},"aggs": {"top_usernames": {"terms": {"field": "comments.username"},"aggs": {"comment_to_issue": {"reverse_nested": {},"aggs": {"top_tags_per_comment": {"terms": {"field": "tags"}}}}}}}}}
}

返回

"aggregations" : {"comments" : {"doc_count" : 4,"top_usernames" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 2,"comment_to_issue" : {"doc_count" : 2,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "improve","doc_count" : 2},{"key" : "advice","doc_count" : 1},{"key" : "bug","doc_count" : 1}]}}},{"key" : "nacy","doc_count" : 1,"comment_to_issue" : {"doc_count" : 1,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "advice","doc_count" : 1},{"key" : "improve","doc_count" : 1}]}}},{"key" : "pony","doc_count" : 1,"comment_to_issue" : {"doc_count" : 1,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "bug","doc_count" : 1},{"key" : "improve","doc_count" : 1}]}}}]}}}

在 Nested Aggregation 聚合下,Reverse nested Aggregation 的子聚合计算聚合的数据集是该嵌套文档的根文档。
根据 Reverse nested Aggregation 的作用,可以清楚这是一个专门作为 Nested Aggregation 子聚合的聚合计算,所以作为顶层聚合或者是作为非 Nested Aggregation 的子聚合是没意义的。
在默认情况下, Reverse nested Aggregation 将找到根文档,当然如果有多层嵌套,也可以通过 path 参数指定文档的路径。

这篇关于02.elasticsearch bucket aggregation查询的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1124328

相关文章

Elasticsearch 在 Java 中的使用教程

《Elasticsearch在Java中的使用教程》Elasticsearch是一个分布式搜索和分析引擎,基于ApacheLucene构建,能够实现实时数据的存储、搜索、和分析,它广泛应用于全文... 目录1. Elasticsearch 简介2. 环境准备2.1 安装 Elasticsearch2.2 J

浅谈mysql的sql_mode可能会限制你的查询

《浅谈mysql的sql_mode可能会限制你的查询》本文主要介绍了浅谈mysql的sql_mode可能会限制你的查询,这个问题主要说明的是,我们写的sql查询语句违背了聚合函数groupby的规则... 目录场景:问题描述原因分析:解决方案:第一种:修改后,只有当前生效,若是mysql服务重启,就会失效;

MySQL多列IN查询的实现

《MySQL多列IN查询的实现》多列IN查询是一种强大的筛选工具,它允许通过多字段组合快速过滤数据,本文主要介绍了MySQL多列IN查询的实现,具有一定的参考价值,感兴趣的可以了解一下... 目录一、基础语法:多列 IN 的两种写法1. 直接值列表2. 子查询二、对比传统 OR 的写法三、性能分析与优化1.

mybatis-plus 实现查询表名动态修改的示例代码

《mybatis-plus实现查询表名动态修改的示例代码》通过MyBatis-Plus实现表名的动态替换,根据配置或入参选择不同的表,本文主要介绍了mybatis-plus实现查询表名动态修改的示... 目录实现数据库初始化依赖包配置读取类设置 myBATis-plus 插件测试通过 mybatis-plu

MySQL中实现多表查询的操作方法(配sql+实操图+案例巩固 通俗易懂版)

《MySQL中实现多表查询的操作方法(配sql+实操图+案例巩固通俗易懂版)》本文主要讲解了MySQL中的多表查询,包括子查询、笛卡尔积、自连接、多表查询的实现方法以及多列子查询等,通过实际例子和操... 目录复合查询1. 回顾查询基本操作group by 分组having1. 显示部门号为10的部门名,员

mysql关联查询速度慢的问题及解决

《mysql关联查询速度慢的问题及解决》:本文主要介绍mysql关联查询速度慢的问题及解决方案,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录mysql关联查询速度慢1. 记录原因1.1 在一次线上的服务中1.2 最终发现2. 解决方案3. 具体操作总结mysql

mysql线上查询之前要性能调优的技巧及示例

《mysql线上查询之前要性能调优的技巧及示例》文章介绍了查询优化的几种方法,包括使用索引、避免不必要的列和行、有效的JOIN策略、子查询和派生表的优化、查询提示和优化器提示等,这些方法可以帮助提高数... 目录避免不必要的列和行使用有效的JOIN策略使用子查询和派生表时要小心使用查询提示和优化器提示其他常

ElasticSearch+Kibana通过Docker部署到Linux服务器中操作方法

《ElasticSearch+Kibana通过Docker部署到Linux服务器中操作方法》本文介绍了Elasticsearch的基本概念,包括文档和字段、索引和映射,还详细描述了如何通过Docker... 目录1、ElasticSearch概念2、ElasticSearch、Kibana和IK分词器部署

SQL 中多表查询的常见连接方式详解

《SQL中多表查询的常见连接方式详解》本文介绍SQL中多表查询的常见连接方式,包括内连接(INNERJOIN)、左连接(LEFTJOIN)、右连接(RIGHTJOIN)、全外连接(FULLOUTER... 目录一、连接类型图表(ASCII 形式)二、前置代码(创建示例表)三、连接方式代码示例1. 内连接(I

轻松上手MYSQL之JSON函数实现高效数据查询与操作

《轻松上手MYSQL之JSON函数实现高效数据查询与操作》:本文主要介绍轻松上手MYSQL之JSON函数实现高效数据查询与操作的相关资料,MySQL提供了多个JSON函数,用于处理和查询JSON数... 目录一、jsON_EXTRACT 提取指定数据二、JSON_UNQUOTE 取消双引号三、JSON_KE