02.elasticsearch bucket aggregation查询

2024-08-31 15:48

本文主要是介绍02.elasticsearch bucket aggregation查询,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

文章目录

    • 1. bucket aggregation 查询类型概览
    • 2. 数据准备
    • 3. 使用样例
      • 1. Terms Aggregation:
        • 1. 普通的terms agg
        • 2. 嵌套一个metric agg 作为sub agg查询
        • 3. 嵌套一个terms agg作为sub agg查询
      • 2. Range Aggregation:
      • 3. Date Histogram Aggregation:
      • 4. Date Range Aggregation
      • 5. Filter Aggregation
      • 6. Filters Aggregation
      • 7. Histogram Aggregation
      • 8. Missing Aggregation: 统计某个field不存在的doc
      • 9. nested aggs:用于nested的doc的聚合查询,一般是再有一个子查询来统计
      • 10. child agg 查询,针对join类型的数据进行查询
      • 11. parent agg 查询,针对join类型的数据进行查询
      • 12. Composite Aggregation 多个维度的terms进行组合操作,类似多层terms的嵌套,但是结果不是嵌套的,和mysql中按照多个字段进行group by类似
      • 13. Adjacency Matrix Aggregation,邻接矩阵聚合
      • 14. global agg 查询,针对所有数据的查询
      • 15. Significant Terms Aggregation: 自动查找显著性的关键字
      • 16. Significant Text Aggregation: 自动查找显著性的关键字
      • 17. Sampler Aggregation: 抽样数据聚合
      • 18.Reverse nested Aggregation 在nested agg中仍然可以对parent 的数据进行统计

elasticsearch的aggregate查询现在越来越丰富了,目前总共有4类。

  1. metric aggregation: 主要是min,max,avg,sum,percetile 等单个统计指标的查询
  2. bucket aggregation: 主要是类似group by的查询操作
  3. matrix aggregation: 使用多个字段的值进行计算从而产生一个多维矩阵
  4. pipline aggregation: 主要是能够在其他的aggregation进行一些附加的处理来增强数据

本篇就主要学习bucket aggregation,bucket aggregation查询类似group by 查询,而且相对metric aggregation 查询来说,bucket agg可以有sub aggregation, 也就是可以进行嵌套,嵌套的sub agg可以是bucket agg也可以是 metric agg。

1. bucket aggregation 查询类型概览

Terms Aggregation: 典型的grop by 类型,按照某个field将文档进行分桶,如果该field的value是数组的话,则该文档会被统计到多个bucket当中
Range Aggregation: 一般是针对number field,指定多个范围进行bucket划分
Date Histogram Aggregation: 按照时间进行分bucket,自动按照月等进行划分
Date Range Aggregation: 按照时间范围进行bucket,类似range aggregation
Filter Aggregation: 就是一个简单的过滤器,和query中的filter功能类似
Filters Aggregation: 多个filter进行过滤
Histogram Aggregation: 柱状图的聚合

Missing Aggregation: 统计某个field不存在的doc
Adjacency Matrix Aggregation
Auto-interval Date Histogram Aggregation
Children Aggregation
Composite Aggregation
Diversified Sampler Aggregation
Geo Distance Aggregation
GeoHash grid Aggregation
GeoTile Grid Aggregation
Global Aggregation
IP Range Aggregation
Nested Aggregation
Parent Aggregation
Reverse nested Aggregation
Sampler Aggregation
Significant Terms Aggregation
Significant Text Aggregation

2. 数据准备

演唱会的票信息
GET seats1028/_search

{
"play" : "Auntie Jo",   # 演唱会名称
"date" : "2018-11-6",  # 时间
"theatre" : "Skyline",  # 地点
"sold" : false,      # 这个票是否已经卖出
"actors" : [         # 演员"Jo Hangum","Jon Hittle","Rob Kettleman","Laura Conrad","Simon Hower","Nora Blue"],
"datetime" : 1541497200000,
"price" : 8321,    # 票价
"tip" : 17.5,      # 优惠
"time" : "5:40PM"
}

总共有3w+条这样的数据

3. 使用样例

1. Terms Aggregation:

典型的grop by 类型,按照某个field将文档进行分桶,如果该field的value是数组的话,则该文档会被统计到多个bucket当中

1. 普通的terms agg
GET seats1028/_search
{"size": 0,"aggs": {"term_price":{"terms": {"field": "price","min_doc_count": 13,"size": 50}}}
}返回
"aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 35384,"buckets" : [{"key" : 910,"doc_count" : 13},{"key" : 3273,"doc_count" : 13},{"key" : 3648,"doc_count" : 13}]}}
2. 嵌套一个metric agg 作为sub agg查询

按照row进行分组,取doc数量最多的前3个bucket,并计算每个bucket中的price的最大值。


GET seats1028/_search
{"size": 0,"aggs": {"term_price":{"terms": {"field": "row","min_doc_count": 13,"size": 3,"order": {"_count": "desc"}},"aggs": {"max_price": {"max": {"field": "price"}}}}}
}返回"aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 13608,"buckets" : [{"key" : 2,"doc_count" : 5796,"max_price" : {"value" : 9998.0}},{"key" : 3,"doc_count" : 5796,"max_price" : {"value" : 9999.0}},{"key" : 1,"doc_count" : 5791,"max_price" : {"value" : 9999.0}}]}}
3. 嵌套一个terms agg作为sub agg查询

先按照row进行bucket划分,给出doc数量前3的row对应的bucket,然后每个bucket按照number进行再分bucket, 并给出doc数量前三的number值对应的bucket。

GET seats1028/_search
{"size": 0,"aggs": {"term_price":{"terms": {"field": "row","min_doc_count": 13,"size": 3,"order": {"_count": "desc"}},"aggs": {"number_term": {"terms": {"field": "number","size": 3,"order": {"_count": "desc"}}}}}}
}返回
"aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 13608,"buckets" : [{"key" : 2,"doc_count" : 5796,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4368,"buckets" : [{"key" : 1,"doc_count" : 476},{"key" : 2,"doc_count" : 476},{"key" : 3,"doc_count" : 476}]}},{"key" : 3,"doc_count" : 5796,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4368,"buckets" : [{"key" : 1,"doc_count" : 476},{"key" : 2,"doc_count" : 476},{"key" : 3,"doc_count" : 476}]}},{"key" : 1,"doc_count" : 5791,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4363,"buckets" : [{"key" : 5,"doc_count" : 476},{"key" : 6,"doc_count" : 476},{"key" : 7,"doc_count" : 476}]}}]}}

2. Range Aggregation:

一般是针对number field,指定多个范围进行bucket划分,包含from数值,不包含to对应的数值

GET seats1028/_search
{"size": 0,"aggs": {"price_range": {"range": {"field": "price","ranges": [{"from": 5000,"to": 6000}]}}}
}返回
"aggregations" : {"price_range" : {"buckets" : [{"key" : "5000.0-6000.0","from" : 5000.0,"to" : 6000.0,"doc_count" : 3646}]}}

3. Date Histogram Aggregation:

按照时间进行分bucket,自动按照月等进行划分

GET seats1028/_search
{"size": 0,"aggs": {"price_date_histogram": {"date_histogram": {"field": "datetime","calendar_interval": "month"}}}
}返回"aggregations" : {"price_date_histogram" : {"buckets" : [{"key_as_string" : "2018-03-01T00:00:00.000Z","key" : 1519862400000,"doc_count" : 2310},{"key_as_string" : "2018-04-01T00:00:00.000Z","key" : 1522540800000,"doc_count" : 3946},{"key_as_string" : "2018-05-01T00:00:00.000Z","key" : 1525132800000,"doc_count" : 3948},{"key_as_string" : "2018-06-01T00:00:00.000Z","key" : 1527811200000,"doc_count" : 3948},{"key_as_string" : "2018-07-01T00:00:00.000Z","key" : 1530403200000,"doc_count" : 3948}]}}

4. Date Range Aggregation

按照时间范围进行bucket,类似range aggregation

GET seats1028/_search
{"size": 0,"aggs": {"price_date_histogram": {"date_range": {"field": "datetime","ranges": [{"from": "2018-10-01T00:00:00.000Z","to": "2018-11-01T00:00:00.000Z"}]}}}
}返回"aggregations" : {"price_date_histogram" : {"buckets" : [{"key" : "2018-10-01T00:00:00.000Z-2018-11-01T00:00:00.000Z","from" : 1.538352E12,"from_as_string" : "2018-10-01T00:00:00.000Z","to" : 1.5410304E12,"to_as_string" : "2018-11-01T00:00:00.000Z","doc_count" : 3948}]}}

5. Filter Aggregation

就是一个简单的过滤器,和query中的filter功能类似

GET seats1028/_search
{"size": 0,"aggs": {"sold_filter": {"filter": {"range": {"tip": {"gte": 10,"lte": 20}}},"aggs": {"max_price": {"max": {"field": "price"}}}}}
}返回
"aggregations" : {"sold_filter" : {"doc_count" : 6300, # 这个是filter后的doc count"max_price" : {"value" : 9996.0}}}

6. Filters Aggregation

多个filter进行过滤, 对于每个filter过滤的结果再应用子agg查询

GET seats1028/_search
{"size": 0,"aggs": {"sold_filter": {"filters": {"filters": {    # 这个地方的用法还是挺怪异的,最终还是"tip_filter": {"range": {"tip": {"gte": 10,"lte": 20}}},"number_filter": {"range": {"number": {"gte": 5,"lte":10}}}}},"aggs": {"max_price": {"max": {"field": "price"}}}}}
}
返回"aggregations" : {"sold_filter" : {"buckets" : {"number_filter" : {"doc_count" : 16072,"max_price" : {"value" : 9999.0}},"tip_filter" : {  "doc_count" : 6300,"max_price" : {"value" : 9996.0}}}}}

可以看到这里对每一个子的filter都进行了过滤

7. Histogram Aggregation

柱状图的聚合,这里用来聚合的字段一般是数值型,比较方便用来分组

GET seats1028/_search
{"size": 0,"aggs": {"tip_histogram":{"histogram": {"field": "tip","interval": 4}}}
}返回"aggregations" : {"number_histogram" : {"buckets" : [{"key" : 16.0,"doc_count" : 4200},{"key" : 20.0,"doc_count" : 8400},{"key" : 24.0,"doc_count" : 17808},{"key" : 28.0,"doc_count" : 5794}]}}

8. Missing Aggregation: 统计某个field不存在的doc

GET seats1028/_search
{"size":0,"aggs": {"miss_f": {"missing": {"field": "row"}}}
}返回
"aggregations" : {"miss_f" : {"doc_count" : 1}}

9. nested aggs:用于nested的doc的聚合查询,一般是再有一个子查询来统计

数据样例
这个查询用于nested的doc的聚合查询,一般是再有一个子查询来统计
数据样例,班级里面有一个学生列表,学生有age,name属性

GET nest_test/_mapping
返回
{"mappings" : {"properties" : {"c_name" : {"type" : "text"},"class" : {"type" : "nested","properties" : {"students" : {"type" : "nested","properties" : {"age" : {"type" : "integer"},"name" : {"type" : "text"}}}}}}}}对应的文档有两个
"_source" : {"c_name" : "start_class","class" : {"students" : [{"name" : "jack chen","age" : 30},{"name" : "jack man","age" : 20},{"name" : "pony wang","age" : 60},{"name" : "gebi wang","age" : 90}]}}"_source" : {"c_name" : "sun_class","class" : {"students" : [{"name" : "lucy chen","age" : 30},{"name" : "lucy man","age" : 20},{"name" : "dong wang","age" : 60},{"name" : "chess wang","age" : 90}]}}

对应的查询


GET nest_test/_search
{"size": 0,"aggs": {"nested_agg": {"nested": {"path": "class.students"},"aggs": {"min_age": {"min": {"field": "class.students.age"}}}}}
}返回"aggregations" : {"nested_agg" : {"doc_count" : 8,"min_age" : {"value" : 20.0}}}

10. child agg 查询,针对join类型的数据进行查询

数据准备,每个教室(class_room)可以有多个课程(subject),每个学生(student)可以选择一个或者多个class_room,这样class_room和student就构成了parent/child的关系


PUT join_class
{"mappings": {"properties": {"subject":{"type": "keyword"},"class_student":{"type": "join","relations":{"class_room":"student"}}}}
}PUT join_class/_doc/1
{"subject":["english","Chinese","Russia"],"class_student":{"name":"class_room"},"des":"this class room teach english, Chinese, Russia"
}PUT join_class/_doc/2?routing=1
{"class_student":{"name":"student","parent":1},"name":"jack"
}PUT join_class/_doc/3?routing=1
{"class_student":{"name":"student","parent":1},"name":"pony"
}

下面这个查询要查找的是每个subject的对应的有哪些学生


GET join_class/_search
{"size":0,"query": {"match_all": {}},"aggs": {"subject_term": {"terms": {"field": "subject","size": 10},"aggs": {"subject_student": {"children": {"type": "student"},"aggs": {"term_name": {"terms": {"field": "name.keyword","size": 10}}}}}}}
}返回"aggregations" : {"subject_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}},{"key" : "Russia","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}},{"key" : "english","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}}]}}

11. parent agg 查询,针对join类型的数据进行查询

承接上面的数据样例,下面的请求查找每个学生选的课程


GET join_class/_search
{"size":0,"query": {"match_all": {}},"aggs": {"student_term": {"terms": {"field": "name.keyword","size": 10},"aggs": {"subject_student": {"parent": {"type": "student"},"aggs": {"choose_subject": {"terms": {"field": "subject","size": 10}}}}}}}
}

返回

 "aggregations" : {"student_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1,"subject_student" : {"doc_count" : 1,"choose_subject" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1},{"key" : "Russia","doc_count" : 1},{"key" : "english","doc_count" : 1}]}}},{"key" : "pony","doc_count" : 1,"subject_student" : {"doc_count" : 1,"choose_subject" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1},{"key" : "Russia","doc_count" : 1},{"key" : "english","doc_count" : 1}]}}}]}}

12. Composite Aggregation 多个维度的terms进行组合操作,类似多层terms的嵌套,但是结果不是嵌套的,和mysql中按照多个字段进行group by类似

数据初始化


PUT composite_test
{"mappings": {"properties": {"area": {"type": "keyword"},"userid": {"type": "keyword"},"sendtime": {"type": "date","format": "yyyy-MM-dd HH:mm:ss"}}}
}
POST composite_test/_bulk
{ "index" : {"_type" :"_doc"}}
{"area":"33","userid":"400015","sendtime":"2019-01-17 00:00:00"}
{ "index" : {"_type" : "_doc"}}
{"area":"33","userid":"400015","sendtime":"2019-01-17 00:00:00"}
{ "index" : {"_type" : "_doc"}}
{"area":"35","userid":"400016","sendtime":"2019-01-18 00:00:00"}
{ "index" : { "_type" : "_doc"}}
{"area":"35","userid":"400016","sendtime":"2019-01-18 00:00:00"}
{ "index" : {"_type" : "_doc"}}
{"area":"33","userid":"400017","sendtime":"2019-01-17 00:00:00"}

下面的查询会按照area,userid, sendtime 三个字段进行group by查询


GET composite_test/_search
{"size": 0,"aggs": {"my_buckets": {"composite": {"sources": [{"area": {"terms": {"field": "area"}}},{"userid": {"terms": {"field": "userid"}}},{"sendtime": {"date_histogram": {"field": "sendtime","fixed_interval": "1d","format": "yyyy-MM-dd"}}}]}}}
}

返回

"aggregations" : {"my_buckets" : {"after_key" : {"area" : "35","userid" : "400016","sendtime" : "2019-01-18"},"buckets" : [{"key" : {"area" : "33","userid" : "400015","sendtime" : "2019-01-17"},"doc_count" : 2},{"key" : {"area" : "33","userid" : "400017","sendtime" : "2019-01-17"},"doc_count" : 1},{"key" : {"area" : "35","userid" : "400016","sendtime" : "2019-01-18"},"doc_count" : 2}]}}

13. Adjacency Matrix Aggregation,邻接矩阵聚合

邻接矩阵聚合,上面的composition是多个维度的terms求交,这个更弱一些,只能做指定的field的某些值进行邻接矩阵生成
使用上面的数据样例,下面的查询会返回area=33的doc统计,userid=400015的doc统计,同时还会返回area=33 & userid=400015的doc统计


GET composite_test/_search
{"size": 0,"aggs": {"composite_two": {"adjacency_matrix": {"filters": {"area_filter":{"terms":{"area":["33"]}},"user_id_filter":{"terms":{"userid":["400015"]}}}}}}

返回

"aggregations" : {"composite_two" : {"buckets" : [{"key" : "area_filter","doc_count" : 3},{"key" : "area_filter&user_id_filter","doc_count" : 2},{"key" : "user_id_filter","doc_count" : 2}]}}

14. global agg 查询,针对所有数据的查询

这个就是忽略query的过滤信息,直接针对index中的所有数据进行子聚合

GET seats1028/_search
{"size": 0, "query": {"term": {"row": {"value": 5}}},"aggs": {"global_row": {"global": {},"aggs": {"avg_row": {"avg": {"field": "row"}}}},"avg_row02":{"avg": {"field": "row"}}}
}

返回

"aggregations" : {"global_row" : {"doc_count" : 30992,"avg_row" : {"value" : 4.333871123874673   # 这个值是从所有的doc中算出来的}},"avg_row02" : {"value" : 5.0  # 这个是query过滤后的doc中计算出来的}}

15. Significant Terms Aggregation: 自动查找显著性的关键字

这个是在keyword的字段中查找当前的显著性的字段,查找出现频率比较高的字段
还是使用案例来说明更靠谱,这里举例的是网页新闻news,每个新闻news有作者(author) title, topic,等信息
相关数据构造如下

PUT news
{"mappings": {"properties": {"published": {"type": "date","format": "dateOptionalTime"},"author": {"type": "keyword"},"title": {"type": "text"},"topic": {"type": "keyword"},"views": {"type": "integer"}}}
}POST news/_bulk
{"index": {"_index": "news"}
}
{"author": "John Michael","published": "2018-07-08","title": "Tesla is flirting with its lowest close in over 1 1/2 years (TSLA)","topic": "automobile","views": "431"
}
{"index": {"_index": "news"}
}
{"author": "John Michael","published": "2018-07-22","title": "Tesla to end up like Lehman Brothers (TSLA)","topic": "automobile","views": "1921"
}
{"index": {"_index": "news"}
}
{"author": "John Michael","published": "2018-07-29","title": "Tesla (TSLA) official says that they are going to release a new self-driving car model in the coming year","topic": "automobile","views": "1849"
}
{"index": {"_index": "news"}
}
{"author": "John Michael","published": "2018-08-14","title": "Five ways Tesla uses AI and Big Data","topic": "ai","views": "871"
}
{"index": {"_index": "news"}
}
{"author": "John Michael","published": "2018-08-14","title": "Toyota partners with Tesla (TSLA) to improve the security of self-driving cars","topic": "automobile","views": "871"
}
{"index": {"_index": "news"}
}
{"author": "Robert Cann","published": "2018-08-25","title": "Is AI dangerous for humanity","topic": "ai","views": "981"
}
{"index": {"_index": "news"}
}
{"author": "Robert Cann","published": "2018-09-13","title": "Is AI dangerous for humanity","topic": "ai","views": "871"
}
{"index": {"_index": "news"}
}
{"author": "Robert Cann","published": "2018-09-27","title": "Introduction to Generative Adversarial Networks (GANs) in self-driving cars","topic": "automobile","views": "1183"
}
{"index": {"_index": "news"}
}
{"author": "Robert Cann","published": "2018-10-09","title": "Introduction to Natural Language Processing","topic": "ai","views": "786"
}
{"index": {"_index": "news"}
}
{"author": "Robert Cann","published": "2018-10-15","title": "New Distant Objects Found in the Fight for Planet X ","topic": "astronomy","views": "542"
}

查找每个作者关注最多的topic,那么该作者肯定在该topic的发问最多

GET news/_search
{"size": 0,"aggregations": {"authors": {"terms": {"field": "author"},"aggregations": {"significant_topic_types": {"significant_terms": {"field": "topic"}}}}}
}

返回

  "aggregations" : {"authors" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "John Michael","doc_count" : 5,"significant_topic_types" : {"doc_count" : 5,"bg_count" : 10,"buckets" : [{"key" : "automobile","doc_count" : 4,"score" : 0.4800000000000001,"bg_count" : 5}]}},{"key" : "Robert Cann","doc_count" : 5,"significant_topic_types" : {"doc_count" : 5,  # Robert Cann 总的doc数量为5个"bg_count" : 10,  # index中所有的doc数量为10"buckets" : [{"key" : "ai","doc_count" : 3,  # Robert Cann 的topic为ai的doc总共有3个"score" : 0.2999999999999999,"bg_count" : 4   ## 这里是指索引中topic是ai的文档总共有4个}]}}]}}

上面的统计说明John Michael 这位作者最关注的话题是 automobile(自动驾驶),而Robert Cann 最关注的是ai相关的话题,相关的bg_count的说明查看上面的注释

16. Significant Text Aggregation: 自动查找显著性的关键字

这个和上面的Significant terms Aggregation类似,就是针对的是text字段,而且会进行分词处理
使用上面的数据进行下面的查询


GET news/_search
{"query": {"match": {"title": " AI "}},"size": 0,"aggs": {"significant_title": {"significant_text": {"field": "title"}}}
}

返回

"aggregations" : {"significant_title" : {"doc_count" : 3,"bg_count" : 10,"buckets" : [{"key" : "ai","doc_count" : 3,"score" : 2.3333333333333335,"bg_count" : 3}]}}

17. Sampler Aggregation: 抽样数据聚合

这个一般是在significant_terms 查询的时候,有时候索引中的数据可能非常大,导致耗时也比较严重,可以用这个来做抽样聚合,抽取更相关的样本数据来进行聚合

POST /stackoverflow/_search?size=0
{"query": {"query_string": {"query": "tags:kibana OR tags:javascript"}},"aggs": {"sample": {"sampler": {"shard_size": 200},"aggs": {"keywords": {"significant_terms": {"field": "tags","exclude": ["kibana", "javascript"]}}}}}
}

shard_size 参数指的是每个分片抽取的样本数量,默认为 100
返回

{..."aggregations": {"sample": {"doc_count": 200,"keywords": {"doc_count": 200,"bg_count": 650,"buckets": [{"key": "elasticsearch","doc_count": 150,"score": 1.078125,"bg_count": 200},{"key": "logstash","doc_count": 50,"score": 0.5625,"bg_count": 50}]}}}
}

18.Reverse nested Aggregation 在nested agg中仍然可以对parent 的数据进行统计

Reverse nested Aggregation 的作用主要是能够让聚合在作为 Nested Aggregation 子聚合的情况下,跳出嵌套类型,对根文档的数据作聚合计算。
有例子:

PUT /issues
{"mappings": {"properties" : {"tags" : { "type" : "keyword" },"comments" : { "type" : "nested","properties" : {"username" : { "type" : "keyword" },"comment" : { "type" : "text" }}}}}
}PUT issues/_doc/1
{"tags": ["bug","improve"],"comments": [{"username": "jack","comment": " this is a bug"},{"username": "pony","comment": " this is a improve"}]
}PUT issues/_doc/2
{"tags": ["advice","improve"],"comments": [{"username": "jack","comment": " this is a good job "},{"username": "nacy","comment": " this is a improvement"}]
}

查询

GET /issues/_search
{"size": 0,"query": {"match_all": {}},"aggs": {"comments": {"nested": {"path": "comments"},"aggs": {"top_usernames": {"terms": {"field": "comments.username"},"aggs": {"comment_to_issue": {"reverse_nested": {},"aggs": {"top_tags_per_comment": {"terms": {"field": "tags"}}}}}}}}}
}

返回

"aggregations" : {"comments" : {"doc_count" : 4,"top_usernames" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 2,"comment_to_issue" : {"doc_count" : 2,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "improve","doc_count" : 2},{"key" : "advice","doc_count" : 1},{"key" : "bug","doc_count" : 1}]}}},{"key" : "nacy","doc_count" : 1,"comment_to_issue" : {"doc_count" : 1,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "advice","doc_count" : 1},{"key" : "improve","doc_count" : 1}]}}},{"key" : "pony","doc_count" : 1,"comment_to_issue" : {"doc_count" : 1,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "bug","doc_count" : 1},{"key" : "improve","doc_count" : 1}]}}}]}}}

在 Nested Aggregation 聚合下,Reverse nested Aggregation 的子聚合计算聚合的数据集是该嵌套文档的根文档。
根据 Reverse nested Aggregation 的作用,可以清楚这是一个专门作为 Nested Aggregation 子聚合的聚合计算,所以作为顶层聚合或者是作为非 Nested Aggregation 的子聚合是没意义的。
在默认情况下, Reverse nested Aggregation 将找到根文档,当然如果有多层嵌套,也可以通过 path 参数指定文档的路径。

这篇关于02.elasticsearch bucket aggregation查询的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1124328

相关文章

ElasticSearch+Kibana通过Docker部署到Linux服务器中操作方法

《ElasticSearch+Kibana通过Docker部署到Linux服务器中操作方法》本文介绍了Elasticsearch的基本概念,包括文档和字段、索引和映射,还详细描述了如何通过Docker... 目录1、ElasticSearch概念2、ElasticSearch、Kibana和IK分词器部署

SQL 中多表查询的常见连接方式详解

《SQL中多表查询的常见连接方式详解》本文介绍SQL中多表查询的常见连接方式,包括内连接(INNERJOIN)、左连接(LEFTJOIN)、右连接(RIGHTJOIN)、全外连接(FULLOUTER... 目录一、连接类型图表(ASCII 形式)二、前置代码(创建示例表)三、连接方式代码示例1. 内连接(I

轻松上手MYSQL之JSON函数实现高效数据查询与操作

《轻松上手MYSQL之JSON函数实现高效数据查询与操作》:本文主要介绍轻松上手MYSQL之JSON函数实现高效数据查询与操作的相关资料,MySQL提供了多个JSON函数,用于处理和查询JSON数... 目录一、jsON_EXTRACT 提取指定数据二、JSON_UNQUOTE 取消双引号三、JSON_KE

查询SQL Server数据库服务器IP地址的多种有效方法

《查询SQLServer数据库服务器IP地址的多种有效方法》作为数据库管理员或开发人员,了解如何查询SQLServer数据库服务器的IP地址是一项重要技能,本文将介绍几种简单而有效的方法,帮助你轻松... 目录使用T-SQL查询方法1:使用系统函数方法2:使用系统视图使用SQL Server Configu

MYSQL关联关系查询方式

《MYSQL关联关系查询方式》文章详细介绍了MySQL中如何使用内连接和左外连接进行表的关联查询,并展示了如何选择列和使用别名,文章还提供了一些关于查询优化的建议,并鼓励读者参考和支持脚本之家... 目录mysql关联关系查询关联关系查询这个查询做了以下几件事MySQL自关联查询总结MYSQL关联关系查询

Java实现Elasticsearch查询当前索引全部数据的完整代码

《Java实现Elasticsearch查询当前索引全部数据的完整代码》:本文主要介绍如何在Java中实现查询Elasticsearch索引中指定条件下的全部数据,通过设置滚动查询参数(scrol... 目录需求背景通常情况Java 实现查询 Elasticsearch 全部数据写在最后需求背景通常情况下

查询Oracle数据库表是否被锁的实现方式

《查询Oracle数据库表是否被锁的实现方式》本文介绍了查询Oracle数据库表是否被锁的方法,包括查询锁表的会话、人员信息,根据object_id查询表名,以及根据会话ID查询和停止本地进程,同时,... 目录查询oracle数据库表是否被锁1、查询锁表的会话、人员等信息2、根据 object_id查询被

Oracle查询优化之高效实现仅查询前10条记录的方法与实践

《Oracle查询优化之高效实现仅查询前10条记录的方法与实践》:本文主要介绍Oracle查询优化之高效实现仅查询前10条记录的相关资料,包括使用ROWNUM、ROW_NUMBER()函数、FET... 目录1. 使用 ROWNUM 查询2. 使用 ROW_NUMBER() 函数3. 使用 FETCH FI

数据库oracle用户密码过期查询及解决方案

《数据库oracle用户密码过期查询及解决方案》:本文主要介绍如何处理ORACLE数据库用户密码过期和修改密码期限的问题,包括创建用户、赋予权限、修改密码、解锁用户和设置密码期限,文中通过代码介绍... 目录前言一、创建用户、赋予权限、修改密码、解锁用户和设置期限二、查询用户密码期限和过期后的修改1.查询用

使用SQL语言查询多个Excel表格的操作方法

《使用SQL语言查询多个Excel表格的操作方法》本文介绍了如何使用SQL语言查询多个Excel表格,通过将所有Excel表格放入一个.xlsx文件中,并使用pandas和pandasql库进行读取和... 目录如何用SQL语言查询多个Excel表格如何使用sql查询excel内容1. 简介2. 实现思路3