02.elasticsearch bucket aggregation查询

2024-08-31 15:48

本文主要是介绍02.elasticsearch bucket aggregation查询,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

文章目录

    • 1. bucket aggregation 查询类型概览
    • 2. 数据准备
    • 3. 使用样例
      • 1. Terms Aggregation:
        • 1. 普通的terms agg
        • 2. 嵌套一个metric agg 作为sub agg查询
        • 3. 嵌套一个terms agg作为sub agg查询
      • 2. Range Aggregation:
      • 3. Date Histogram Aggregation:
      • 4. Date Range Aggregation
      • 5. Filter Aggregation
      • 6. Filters Aggregation
      • 7. Histogram Aggregation
      • 8. Missing Aggregation: 统计某个field不存在的doc
      • 9. nested aggs:用于nested的doc的聚合查询,一般是再有一个子查询来统计
      • 10. child agg 查询,针对join类型的数据进行查询
      • 11. parent agg 查询,针对join类型的数据进行查询
      • 12. Composite Aggregation 多个维度的terms进行组合操作,类似多层terms的嵌套,但是结果不是嵌套的,和mysql中按照多个字段进行group by类似
      • 13. Adjacency Matrix Aggregation,邻接矩阵聚合
      • 14. global agg 查询,针对所有数据的查询
      • 15. Significant Terms Aggregation: 自动查找显著性的关键字
      • 16. Significant Text Aggregation: 自动查找显著性的关键字
      • 17. Sampler Aggregation: 抽样数据聚合
      • 18.Reverse nested Aggregation 在nested agg中仍然可以对parent 的数据进行统计

elasticsearch的aggregate查询现在越来越丰富了,目前总共有4类。

  1. metric aggregation: 主要是min,max,avg,sum,percetile 等单个统计指标的查询
  2. bucket aggregation: 主要是类似group by的查询操作
  3. matrix aggregation: 使用多个字段的值进行计算从而产生一个多维矩阵
  4. pipline aggregation: 主要是能够在其他的aggregation进行一些附加的处理来增强数据

本篇就主要学习bucket aggregation,bucket aggregation查询类似group by 查询,而且相对metric aggregation 查询来说,bucket agg可以有sub aggregation, 也就是可以进行嵌套,嵌套的sub agg可以是bucket agg也可以是 metric agg。

1. bucket aggregation 查询类型概览

Terms Aggregation: 典型的grop by 类型,按照某个field将文档进行分桶,如果该field的value是数组的话,则该文档会被统计到多个bucket当中
Range Aggregation: 一般是针对number field,指定多个范围进行bucket划分
Date Histogram Aggregation: 按照时间进行分bucket,自动按照月等进行划分
Date Range Aggregation: 按照时间范围进行bucket,类似range aggregation
Filter Aggregation: 就是一个简单的过滤器,和query中的filter功能类似
Filters Aggregation: 多个filter进行过滤
Histogram Aggregation: 柱状图的聚合

Missing Aggregation: 统计某个field不存在的doc
Adjacency Matrix Aggregation
Auto-interval Date Histogram Aggregation
Children Aggregation
Composite Aggregation
Diversified Sampler Aggregation
Geo Distance Aggregation
GeoHash grid Aggregation
GeoTile Grid Aggregation
Global Aggregation
IP Range Aggregation
Nested Aggregation
Parent Aggregation
Reverse nested Aggregation
Sampler Aggregation
Significant Terms Aggregation
Significant Text Aggregation

2. 数据准备

演唱会的票信息
GET seats1028/_search

{
"play" : "Auntie Jo",   # 演唱会名称
"date" : "2018-11-6",  # 时间
"theatre" : "Skyline",  # 地点
"sold" : false,      # 这个票是否已经卖出
"actors" : [         # 演员"Jo Hangum","Jon Hittle","Rob Kettleman","Laura Conrad","Simon Hower","Nora Blue"],
"datetime" : 1541497200000,
"price" : 8321,    # 票价
"tip" : 17.5,      # 优惠
"time" : "5:40PM"
}

总共有3w+条这样的数据

3. 使用样例

1. Terms Aggregation:

典型的grop by 类型,按照某个field将文档进行分桶,如果该field的value是数组的话,则该文档会被统计到多个bucket当中

1. 普通的terms agg
GET seats1028/_search
{"size": 0,"aggs": {"term_price":{"terms": {"field": "price","min_doc_count": 13,"size": 50}}}
}返回
"aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 35384,"buckets" : [{"key" : 910,"doc_count" : 13},{"key" : 3273,"doc_count" : 13},{"key" : 3648,"doc_count" : 13}]}}
2. 嵌套一个metric agg 作为sub agg查询

按照row进行分组,取doc数量最多的前3个bucket,并计算每个bucket中的price的最大值。


GET seats1028/_search
{"size": 0,"aggs": {"term_price":{"terms": {"field": "row","min_doc_count": 13,"size": 3,"order": {"_count": "desc"}},"aggs": {"max_price": {"max": {"field": "price"}}}}}
}返回"aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 13608,"buckets" : [{"key" : 2,"doc_count" : 5796,"max_price" : {"value" : 9998.0}},{"key" : 3,"doc_count" : 5796,"max_price" : {"value" : 9999.0}},{"key" : 1,"doc_count" : 5791,"max_price" : {"value" : 9999.0}}]}}
3. 嵌套一个terms agg作为sub agg查询

先按照row进行bucket划分,给出doc数量前3的row对应的bucket,然后每个bucket按照number进行再分bucket, 并给出doc数量前三的number值对应的bucket。

GET seats1028/_search
{"size": 0,"aggs": {"term_price":{"terms": {"field": "row","min_doc_count": 13,"size": 3,"order": {"_count": "desc"}},"aggs": {"number_term": {"terms": {"field": "number","size": 3,"order": {"_count": "desc"}}}}}}
}返回
"aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 13608,"buckets" : [{"key" : 2,"doc_count" : 5796,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4368,"buckets" : [{"key" : 1,"doc_count" : 476},{"key" : 2,"doc_count" : 476},{"key" : 3,"doc_count" : 476}]}},{"key" : 3,"doc_count" : 5796,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4368,"buckets" : [{"key" : 1,"doc_count" : 476},{"key" : 2,"doc_count" : 476},{"key" : 3,"doc_count" : 476}]}},{"key" : 1,"doc_count" : 5791,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4363,"buckets" : [{"key" : 5,"doc_count" : 476},{"key" : 6,"doc_count" : 476},{"key" : 7,"doc_count" : 476}]}}]}}

2. Range Aggregation:

一般是针对number field,指定多个范围进行bucket划分,包含from数值,不包含to对应的数值

GET seats1028/_search
{"size": 0,"aggs": {"price_range": {"range": {"field": "price","ranges": [{"from": 5000,"to": 6000}]}}}
}返回
"aggregations" : {"price_range" : {"buckets" : [{"key" : "5000.0-6000.0","from" : 5000.0,"to" : 6000.0,"doc_count" : 3646}]}}

3. Date Histogram Aggregation:

按照时间进行分bucket,自动按照月等进行划分

GET seats1028/_search
{"size": 0,"aggs": {"price_date_histogram": {"date_histogram": {"field": "datetime","calendar_interval": "month"}}}
}返回"aggregations" : {"price_date_histogram" : {"buckets" : [{"key_as_string" : "2018-03-01T00:00:00.000Z","key" : 1519862400000,"doc_count" : 2310},{"key_as_string" : "2018-04-01T00:00:00.000Z","key" : 1522540800000,"doc_count" : 3946},{"key_as_string" : "2018-05-01T00:00:00.000Z","key" : 1525132800000,"doc_count" : 3948},{"key_as_string" : "2018-06-01T00:00:00.000Z","key" : 1527811200000,"doc_count" : 3948},{"key_as_string" : "2018-07-01T00:00:00.000Z","key" : 1530403200000,"doc_count" : 3948}]}}

4. Date Range Aggregation

按照时间范围进行bucket,类似range aggregation

GET seats1028/_search
{"size": 0,"aggs": {"price_date_histogram": {"date_range": {"field": "datetime","ranges": [{"from": "2018-10-01T00:00:00.000Z","to": "2018-11-01T00:00:00.000Z"}]}}}
}返回"aggregations" : {"price_date_histogram" : {"buckets" : [{"key" : "2018-10-01T00:00:00.000Z-2018-11-01T00:00:00.000Z","from" : 1.538352E12,"from_as_string" : "2018-10-01T00:00:00.000Z","to" : 1.5410304E12,"to_as_string" : "2018-11-01T00:00:00.000Z","doc_count" : 3948}]}}

5. Filter Aggregation

就是一个简单的过滤器,和query中的filter功能类似

GET seats1028/_search
{"size": 0,"aggs": {"sold_filter": {"filter": {"range": {"tip": {"gte": 10,"lte": 20}}},"aggs": {"max_price": {"max": {"field": "price"}}}}}
}返回
"aggregations" : {"sold_filter" : {"doc_count" : 6300, # 这个是filter后的doc count"max_price" : {"value" : 9996.0}}}

6. Filters Aggregation

多个filter进行过滤, 对于每个filter过滤的结果再应用子agg查询

GET seats1028/_search
{"size": 0,"aggs": {"sold_filter": {"filters": {"filters": {    # 这个地方的用法还是挺怪异的,最终还是"tip_filter": {"range": {"tip": {"gte": 10,"lte": 20}}},"number_filter": {"range": {"number": {"gte": 5,"lte":10}}}}},"aggs": {"max_price": {"max": {"field": "price"}}}}}
}
返回"aggregations" : {"sold_filter" : {"buckets" : {"number_filter" : {"doc_count" : 16072,"max_price" : {"value" : 9999.0}},"tip_filter" : {  "doc_count" : 6300,"max_price" : {"value" : 9996.0}}}}}

可以看到这里对每一个子的filter都进行了过滤

7. Histogram Aggregation

柱状图的聚合,这里用来聚合的字段一般是数值型,比较方便用来分组

GET seats1028/_search
{"size": 0,"aggs": {"tip_histogram":{"histogram": {"field": "tip","interval": 4}}}
}返回"aggregations" : {"number_histogram" : {"buckets" : [{"key" : 16.0,"doc_count" : 4200},{"key" : 20.0,"doc_count" : 8400},{"key" : 24.0,"doc_count" : 17808},{"key" : 28.0,"doc_count" : 5794}]}}

8. Missing Aggregation: 统计某个field不存在的doc

GET seats1028/_search
{"size":0,"aggs": {"miss_f": {"missing": {"field": "row"}}}
}返回
"aggregations" : {"miss_f" : {"doc_count" : 1}}

9. nested aggs:用于nested的doc的聚合查询,一般是再有一个子查询来统计

数据样例
这个查询用于nested的doc的聚合查询,一般是再有一个子查询来统计
数据样例,班级里面有一个学生列表,学生有age,name属性

GET nest_test/_mapping
返回
{"mappings" : {"properties" : {"c_name" : {"type" : "text"},"class" : {"type" : "nested","properties" : {"students" : {"type" : "nested","properties" : {"age" : {"type" : "integer"},"name" : {"type" : "text"}}}}}}}}对应的文档有两个
"_source" : {"c_name" : "start_class","class" : {"students" : [{"name" : "jack chen","age" : 30},{"name" : "jack man","age" : 20},{"name" : "pony wang","age" : 60},{"name" : "gebi wang","age" : 90}]}}"_source" : {"c_name" : "sun_class","class" : {"students" : [{"name" : "lucy chen","age" : 30},{"name" : "lucy man","age" : 20},{"name" : "dong wang","age" : 60},{"name" : "chess wang","age" : 90}]}}

对应的查询


GET nest_test/_search
{"size": 0,"aggs": {"nested_agg": {"nested": {"path": "class.students"},"aggs": {"min_age": {"min": {"field": "class.students.age"}}}}}
}返回"aggregations" : {"nested_agg" : {"doc_count" : 8,"min_age" : {"value" : 20.0}}}

10. child agg 查询,针对join类型的数据进行查询

数据准备,每个教室(class_room)可以有多个课程(subject),每个学生(student)可以选择一个或者多个class_room,这样class_room和student就构成了parent/child的关系


PUT join_class
{"mappings": {"properties": {"subject":{"type": "keyword"},"class_student":{"type": "join","relations":{"class_room":"student"}}}}
}PUT join_class/_doc/1
{"subject":["english","Chinese","Russia"],"class_student":{"name":"class_room"},"des":"this class room teach english, Chinese, Russia"
}PUT join_class/_doc/2?routing=1
{"class_student":{"name":"student","parent":1},"name":"jack"
}PUT join_class/_doc/3?routing=1
{"class_student":{"name":"student","parent":1},"name":"pony"
}

下面这个查询要查找的是每个subject的对应的有哪些学生


GET join_class/_search
{"size":0,"query": {"match_all": {}},"aggs": {"subject_term": {"terms": {"field": "subject","size": 10},"aggs": {"subject_student": {"children": {"type": "student"},"aggs": {"term_name": {"terms": {"field": "name.keyword","size": 10}}}}}}}
}返回"aggregations" : {"subject_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}},{"key" : "Russia","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}},{"key" : "english","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}}]}}

11. parent agg 查询,针对join类型的数据进行查询

承接上面的数据样例,下面的请求查找每个学生选的课程


GET join_class/_search
{"size":0,"query": {"match_all": {}},"aggs": {"student_term": {"terms": {"field": "name.keyword","size": 10},"aggs": {"subject_student": {"parent": {"type": "student"},"aggs": {"choose_subject": {"terms": {"field": "subject","size": 10}}}}}}}
}

返回

 "aggregations" : {"student_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1,"subject_student" : {"doc_count" : 1,"choose_subject" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1},{"key" : "Russia","doc_count" : 1},{"key" : "english","doc_count" : 1}]}}},{"key" : "pony","doc_count" : 1,"subject_student" : {"doc_count" : 1,"choose_subject" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1},{"key" : "Russia","doc_count" : 1},{"key" : "english","doc_count" : 1}]}}}]}}

12. Composite Aggregation 多个维度的terms进行组合操作,类似多层terms的嵌套,但是结果不是嵌套的,和mysql中按照多个字段进行group by类似

数据初始化


PUT composite_test
{"mappings": {"properties": {"area": {"type": "keyword"},"userid": {"type": "keyword"},"sendtime": {"type": "date","format": "yyyy-MM-dd HH:mm:ss"}}}
}
POST composite_test/_bulk
{ "index" : {"_type" :"_doc"}}
{"area":"33","userid":"400015","sendtime":"2019-01-17 00:00:00"}
{ "index" : {"_type" : "_doc"}}
{"area":"33","userid":"400015","sendtime":"2019-01-17 00:00:00"}
{ "index" : {"_type" : "_doc"}}
{"area":"35","userid":"400016","sendtime":"2019-01-18 00:00:00"}
{ "index" : { "_type" : "_doc"}}
{"area":"35","userid":"400016","sendtime":"2019-01-18 00:00:00"}
{ "index" : {"_type" : "_doc"}}
{"area":"33","userid":"400017","sendtime":"2019-01-17 00:00:00"}

下面的查询会按照area,userid, sendtime 三个字段进行group by查询


GET composite_test/_search
{"size": 0,"aggs": {"my_buckets": {"composite": {"sources": [{"area": {"terms": {"field": "area"}}},{"userid": {"terms": {"field": "userid"}}},{"sendtime": {"date_histogram": {"field": "sendtime","fixed_interval": "1d","format": "yyyy-MM-dd"}}}]}}}
}

返回

"aggregations" : {"my_buckets" : {"after_key" : {"area" : "35","userid" : "400016","sendtime" : "2019-01-18"},"buckets" : [{"key" : {"area" : "33","userid" : "400015","sendtime" : "2019-01-17"},"doc_count" : 2},{"key" : {"area" : "33","userid" : "400017","sendtime" : "2019-01-17"},"doc_count" : 1},{"key" : {"area" : "35","userid" : "400016","sendtime" : "2019-01-18"},"doc_count" : 2}]}}

13. Adjacency Matrix Aggregation,邻接矩阵聚合

邻接矩阵聚合,上面的composition是多个维度的terms求交,这个更弱一些,只能做指定的field的某些值进行邻接矩阵生成
使用上面的数据样例,下面的查询会返回area=33的doc统计,userid=400015的doc统计,同时还会返回area=33 & userid=400015的doc统计


GET composite_test/_search
{"size": 0,"aggs": {"composite_two": {"adjacency_matrix": {"filters": {"area_filter":{"terms":{"area":["33"]}},"user_id_filter":{"terms":{"userid":["400015"]}}}}}}

返回

"aggregations" : {"composite_two" : {"buckets" : [{"key" : "area_filter","doc_count" : 3},{"key" : "area_filter&user_id_filter","doc_count" : 2},{"key" : "user_id_filter","doc_count" : 2}]}}

14. global agg 查询,针对所有数据的查询

这个就是忽略query的过滤信息,直接针对index中的所有数据进行子聚合

GET seats1028/_search
{"size": 0, "query": {"term": {"row": {"value": 5}}},"aggs": {"global_row": {"global": {},"aggs": {"avg_row": {"avg": {"field": "row"}}}},"avg_row02":{"avg": {"field": "row"}}}
}

返回

"aggregations" : {"global_row" : {"doc_count" : 30992,"avg_row" : {"value" : 4.333871123874673   # 这个值是从所有的doc中算出来的}},"avg_row02" : {"value" : 5.0  # 这个是query过滤后的doc中计算出来的}}

15. Significant Terms Aggregation: 自动查找显著性的关键字

这个是在keyword的字段中查找当前的显著性的字段,查找出现频率比较高的字段
还是使用案例来说明更靠谱,这里举例的是网页新闻news,每个新闻news有作者(author) title, topic,等信息
相关数据构造如下

PUT news
{"mappings": {"properties": {"published": {"type": "date","format": "dateOptionalTime"},"author": {"type": "keyword"},"title": {"type": "text"},"topic": {"type": "keyword"},"views": {"type": "integer"}}}
}POST news/_bulk
{"index": {"_index": "news"}
}
{"author": "John Michael","published": "2018-07-08","title": "Tesla is flirting with its lowest close in over 1 1/2 years (TSLA)","topic": "automobile","views": "431"
}
{"index": {"_index": "news"}
}
{"author": "John Michael","published": "2018-07-22","title": "Tesla to end up like Lehman Brothers (TSLA)","topic": "automobile","views": "1921"
}
{"index": {"_index": "news"}
}
{"author": "John Michael","published": "2018-07-29","title": "Tesla (TSLA) official says that they are going to release a new self-driving car model in the coming year","topic": "automobile","views": "1849"
}
{"index": {"_index": "news"}
}
{"author": "John Michael","published": "2018-08-14","title": "Five ways Tesla uses AI and Big Data","topic": "ai","views": "871"
}
{"index": {"_index": "news"}
}
{"author": "John Michael","published": "2018-08-14","title": "Toyota partners with Tesla (TSLA) to improve the security of self-driving cars","topic": "automobile","views": "871"
}
{"index": {"_index": "news"}
}
{"author": "Robert Cann","published": "2018-08-25","title": "Is AI dangerous for humanity","topic": "ai","views": "981"
}
{"index": {"_index": "news"}
}
{"author": "Robert Cann","published": "2018-09-13","title": "Is AI dangerous for humanity","topic": "ai","views": "871"
}
{"index": {"_index": "news"}
}
{"author": "Robert Cann","published": "2018-09-27","title": "Introduction to Generative Adversarial Networks (GANs) in self-driving cars","topic": "automobile","views": "1183"
}
{"index": {"_index": "news"}
}
{"author": "Robert Cann","published": "2018-10-09","title": "Introduction to Natural Language Processing","topic": "ai","views": "786"
}
{"index": {"_index": "news"}
}
{"author": "Robert Cann","published": "2018-10-15","title": "New Distant Objects Found in the Fight for Planet X ","topic": "astronomy","views": "542"
}

查找每个作者关注最多的topic,那么该作者肯定在该topic的发问最多

GET news/_search
{"size": 0,"aggregations": {"authors": {"terms": {"field": "author"},"aggregations": {"significant_topic_types": {"significant_terms": {"field": "topic"}}}}}
}

返回

  "aggregations" : {"authors" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "John Michael","doc_count" : 5,"significant_topic_types" : {"doc_count" : 5,"bg_count" : 10,"buckets" : [{"key" : "automobile","doc_count" : 4,"score" : 0.4800000000000001,"bg_count" : 5}]}},{"key" : "Robert Cann","doc_count" : 5,"significant_topic_types" : {"doc_count" : 5,  # Robert Cann 总的doc数量为5个"bg_count" : 10,  # index中所有的doc数量为10"buckets" : [{"key" : "ai","doc_count" : 3,  # Robert Cann 的topic为ai的doc总共有3个"score" : 0.2999999999999999,"bg_count" : 4   ## 这里是指索引中topic是ai的文档总共有4个}]}}]}}

上面的统计说明John Michael 这位作者最关注的话题是 automobile(自动驾驶),而Robert Cann 最关注的是ai相关的话题,相关的bg_count的说明查看上面的注释

16. Significant Text Aggregation: 自动查找显著性的关键字

这个和上面的Significant terms Aggregation类似,就是针对的是text字段,而且会进行分词处理
使用上面的数据进行下面的查询


GET news/_search
{"query": {"match": {"title": " AI "}},"size": 0,"aggs": {"significant_title": {"significant_text": {"field": "title"}}}
}

返回

"aggregations" : {"significant_title" : {"doc_count" : 3,"bg_count" : 10,"buckets" : [{"key" : "ai","doc_count" : 3,"score" : 2.3333333333333335,"bg_count" : 3}]}}

17. Sampler Aggregation: 抽样数据聚合

这个一般是在significant_terms 查询的时候,有时候索引中的数据可能非常大,导致耗时也比较严重,可以用这个来做抽样聚合,抽取更相关的样本数据来进行聚合

POST /stackoverflow/_search?size=0
{"query": {"query_string": {"query": "tags:kibana OR tags:javascript"}},"aggs": {"sample": {"sampler": {"shard_size": 200},"aggs": {"keywords": {"significant_terms": {"field": "tags","exclude": ["kibana", "javascript"]}}}}}
}

shard_size 参数指的是每个分片抽取的样本数量,默认为 100
返回

{..."aggregations": {"sample": {"doc_count": 200,"keywords": {"doc_count": 200,"bg_count": 650,"buckets": [{"key": "elasticsearch","doc_count": 150,"score": 1.078125,"bg_count": 200},{"key": "logstash","doc_count": 50,"score": 0.5625,"bg_count": 50}]}}}
}

18.Reverse nested Aggregation 在nested agg中仍然可以对parent 的数据进行统计

Reverse nested Aggregation 的作用主要是能够让聚合在作为 Nested Aggregation 子聚合的情况下,跳出嵌套类型,对根文档的数据作聚合计算。
有例子:

PUT /issues
{"mappings": {"properties" : {"tags" : { "type" : "keyword" },"comments" : { "type" : "nested","properties" : {"username" : { "type" : "keyword" },"comment" : { "type" : "text" }}}}}
}PUT issues/_doc/1
{"tags": ["bug","improve"],"comments": [{"username": "jack","comment": " this is a bug"},{"username": "pony","comment": " this is a improve"}]
}PUT issues/_doc/2
{"tags": ["advice","improve"],"comments": [{"username": "jack","comment": " this is a good job "},{"username": "nacy","comment": " this is a improvement"}]
}

查询

GET /issues/_search
{"size": 0,"query": {"match_all": {}},"aggs": {"comments": {"nested": {"path": "comments"},"aggs": {"top_usernames": {"terms": {"field": "comments.username"},"aggs": {"comment_to_issue": {"reverse_nested": {},"aggs": {"top_tags_per_comment": {"terms": {"field": "tags"}}}}}}}}}
}

返回

"aggregations" : {"comments" : {"doc_count" : 4,"top_usernames" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 2,"comment_to_issue" : {"doc_count" : 2,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "improve","doc_count" : 2},{"key" : "advice","doc_count" : 1},{"key" : "bug","doc_count" : 1}]}}},{"key" : "nacy","doc_count" : 1,"comment_to_issue" : {"doc_count" : 1,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "advice","doc_count" : 1},{"key" : "improve","doc_count" : 1}]}}},{"key" : "pony","doc_count" : 1,"comment_to_issue" : {"doc_count" : 1,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "bug","doc_count" : 1},{"key" : "improve","doc_count" : 1}]}}}]}}}

在 Nested Aggregation 聚合下,Reverse nested Aggregation 的子聚合计算聚合的数据集是该嵌套文档的根文档。
根据 Reverse nested Aggregation 的作用,可以清楚这是一个专门作为 Nested Aggregation 子聚合的聚合计算,所以作为顶层聚合或者是作为非 Nested Aggregation 的子聚合是没意义的。
在默认情况下, Reverse nested Aggregation 将找到根文档,当然如果有多层嵌套,也可以通过 path 参数指定文档的路径。

这篇关于02.elasticsearch bucket aggregation查询的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1124328

相关文章

基于MySQL Binlog的Elasticsearch数据同步实践

一、为什么要做 随着马蜂窝的逐渐发展,我们的业务数据越来越多,单纯使用 MySQL 已经不能满足我们的数据查询需求,例如对于商品、订单等数据的多维度检索。 使用 Elasticsearch 存储业务数据可以很好的解决我们业务中的搜索需求。而数据进行异构存储后,随之而来的就是数据同步的问题。 二、现有方法及问题 对于数据同步,我们目前的解决方案是建立数据中间表。把需要检索的业务数据,统一放到一张M

活用c4d官方开发文档查询代码

当你问AI助手比如豆包,如何用python禁止掉xpresso标签时候,它会提示到 这时候要用到两个东西。https://developers.maxon.net/论坛搜索和开发文档 比如这里我就在官方找到正确的id描述 然后我就把参数标签换过来

ural 1026. Questions and Answers 查询

1026. Questions and Answers Time limit: 2.0 second Memory limit: 64 MB Background The database of the Pentagon contains a top-secret information. We don’t know what the information is — you

Mybatis中的like查询

<if test="templateName != null and templateName != ''">AND template_name LIKE CONCAT('%',#{templateName,jdbcType=VARCHAR},'%')</if>

Git 的特点—— Git 学习笔记 02

文章目录 Git 简史Git 的特点直接记录快照,而非差异比较近乎所有操作都是本地执行保证完整性一般只添加数据 参考资料 Git 简史 众所周知,Linux 内核开源项目有着为数众多的参与者。这么多人在世界各地为 Linux 编写代码,那Linux 的代码是如何管理的呢?事实是在 2002 年以前,世界各地的开发者把源代码通过 diff 的方式发给 Linus,然后由 Linus

京东物流查询|开发者调用API接口实现

快递聚合查询的优势 1、高效整合多种快递信息。2、实时动态更新。3、自动化管理流程。 聚合国内外1500家快递公司的物流信息查询服务,使用API接口查询京东物流的便捷步骤,首先选择专业的数据平台的快递API接口:物流快递查询API接口-单号查询API - 探数数据 以下示例是参考的示例代码: import requestsurl = "http://api.tanshuapi.com/a

MySQL record 02 part

查看已建数据库的基本信息: show CREATE DATABASE mydb; 注意,是DATABASE 不是 DATABASEs, 命令成功执行后,回显的信息有: CREATE DATABASE mydb /*!40100 DEFAULT CHARACTER SET utf8mb3 / /!80016 DEFAULT ENCRYPTION=‘N’ / CREATE DATABASE myd

DAY16:什么是慢查询,导致的原因,优化方法 | undo log、redo log、binlog的用处 | MySQL有哪些锁

目录 什么是慢查询,导致的原因,优化方法 undo log、redo log、binlog的用处  MySQL有哪些锁   什么是慢查询,导致的原因,优化方法 数据库查询的执行时间超过指定的超时时间时,就被称为慢查询。 导致的原因: 查询语句比较复杂:查询涉及多个表,包含复杂的连接和子查询,可能导致执行时间较长。查询数据量大:当查询的数据量庞大时,即使查询本身并不复杂,也可能导致

oracle11.2g递归查询(树形结构查询)

转自: 一 二 简单语法介绍 一、树型表结构:节点ID 上级ID 节点名称二、公式: select 节点ID,节点名称,levelfrom 表connect by prior 节点ID=上级节点IDstart with 上级节点ID=节点值 oracle官网解说 开发人员:SQL 递归: 在 Oracle Database 11g 第 2 版中查询层次结构数据的快速

GPU 计算 CMPS224 2021 学习笔记 02

并行类型 (1)任务并行 (2)数据并行 CPU & GPU CPU和GPU拥有相互独立的内存空间,需要在两者之间相互传输数据。 (1)分配GPU内存 (2)将CPU上的数据复制到GPU上 (3)在GPU上对数据进行计算操作 (4)将计算结果从GPU复制到CPU上 (5)释放GPU内存 CUDA内存管理API (1)分配内存 cudaErro