Elasticsearch-Metrics Aggregations(度量聚合/指标聚合)

2023-11-02 12:10

本文主要是介绍Elasticsearch-Metrics Aggregations(度量聚合/指标聚合),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

前言

本文基于elasticsearch7.3.0版本
在这里插入图片描述

聚合的基本结构

"aggregations" : {"<aggregation_name>" : {"<aggregation_type>" : {<aggregation_body>}[,"meta" : {  [<meta_data_body>] } ]?[,"aggregations" : { [<sub_aggregation>]+ } ]?}[,"<aggregation_name_2>" : { ... } ]*
}

准备测试数据

PUT my_index
{"mappings": {"properties": {"tag": {"type": "keyword"},"price": {"type": "scaled_float","scaling_factor": 100}}}
}PUT my_index/_doc/1
{"tag": "没有价格的水果"
}PUT my_index/_doc/2
{"tag": "橘子","price": "1.00"
}PUT my_index/_doc/3
{"tag": "苹果","price": "9.00"
}

avg,max,min,sum,value_count,stats,extended_stats

这几种聚合语法都差不太多,所以一起看

  • avg:平均值
  • max:最大值
  • min:最小值
  • sum:求和
  • value_count:总数
  • stats:一次性返回avg,max,min,sum,value_count
  • extended_stats:stats聚合的扩展

求水果价格的平均值

GET my_index/_search
{"size": 0,"aggs": {"price_avg": {"avg": {"field": "price",// 设置字段的缺省值"missing": 1}}}
}
{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 3,"relation" : "eq"},"max_score" : 1.0,"hits" : []},"aggregations" : {"price_avg" : {"value" : 3.6666666666666665}}
}

使用脚本

GET my_index/_search
{"size": 0,"aggs": {"price_avg": {"avg": {"script": {"source": "doc['price']"}}}}
}

使用value script

GET my_index/_search
{"size": 0,"aggs": {"price_avg": {"avg": {"field": "price","script": {"lang": "painless","source": "_value * params.number","params": {"number": 1.5}}}}}
}

stats聚合

GET my_index/_search
{"size": 0,"aggs": {"price_stats": {"stats": {"field": "price"}}}
}
{"took" : 3,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 3,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"price_stats" : {"count" : 2,"min" : 1.0,"max" : 9.0,"avg" : 5.0,"sum" : 10.0}}
}

cardinality

去重,去重的结果是近似值,并不是准确的

这个precision_threshold选项允许用内存来换取准确性,并定义了一个唯一的计数,在此计数以下的计数预计接近准确。在此值之上,计数可能变得更加模糊。最大支持值为40000,高于此数字的阈值将具有与阈值40000相同的效果。默认值是3000.

# 聚合tag去重数量
GET my_index/_search
{"size": 0,"aggs": {"tag_cardinality": {"cardinality": {"field": "tag","precision_threshold": 3000}}}
}

使用脚本
这个cardinality度量支持脚本,但是性能受到显著影响,因为散列需要动态计算

GET my_index/_search
{"size": 0,"aggs": {"tag_cardinality": {"cardinality": {"script": {"lang": "painless","source": "doc['tag']+' '+doc['price']"}}}}
}

percentiles

百分位聚合

GET my_index/_search
{"size": 0,"aggs": {"price_percentiles": {"percentiles": {// field必须是数字字段"field": "price"}}}
}

默认情况下,percentile度量将生成一系列百分位数:[ 1, 5, 25, 50, 75, 95, 99 ]
响应

{"took" : 2,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 3,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"price_percentiles" : {"values" : {"1.0" : 1.0,"5.0" : 1.0,"25.0" : 1.0,"50.0" : 5.0,"75.0" : 9.0,"95.0" : 9.0,"99.0" : 9.0}}}
}

使用percents参数指定要计算的特定百分位数

GET my_index/_search
{"size": 0,"aggs": {"price_percentiles": {"percentiles": {"field": "price",// 以数组的方式返回"keyed": false,"percents": [95,99,99.99]}}}
}

响应

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 3,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"price_percentiles" : {"values" : [{"key" : 95.0,"value" : 9.0},{"key" : 99.0,"value" : 9.0},{"key" : 99.99,"value" : 9.0}]}}
}

使用脚本

GET my_index/_search
{"size": 0,"aggs": {"price_percentiles": {"percentiles": {"field": "price","script": {"lang": "painless","source": "_value * params.number","params": {"number": 10}}}}}
}

percentile_ranks

和percentiles类似,可以指定百分位区间

GET my_index/_search
{"size": 0,"aggs": {"price_percentile_ranks": {"percentile_ranks": {// field必须是数字字段"field": "price","values": [90,99],"keyed": false,"script": {"lang": "painless","source": "_value * params.number","params": {"number": 10}}}}}
}

响应

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 3,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"price_percentile_ranks" : {"values" : [{"key" : 90.0,"value" : 100.0},{"key" : 99.0,"value" : 100.0}]}}
}

top_hits

此聚合器将用作子聚合器,以便在每个桶中聚合最高匹配的文档

GET my_index/_search
{"size": 0,"aggs": {"tag_terms": {"terms": {"field": "tag","size": 10},"aggs": {"tag_top": {"top_hits": {"from": 0,"size": 10,"sort": [{"price": {"order": "desc"}}]}}}}}
}

响应

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 3,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"tag_terms" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "橘子","doc_count" : 1,"tag_top" : {"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "my_index","_type" : "_doc","_id" : "2","_score" : null,"_source" : {"tag" : "橘子","price" : "1.00"},"sort" : [1.0]}]}}},{"key" : "没有价格的水果","doc_count" : 1,"tag_top" : {"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "my_index","_type" : "_doc","_id" : "1","_score" : null,"_source" : {"tag" : "没有价格的水果"},"sort" : ["-Infinity"]}]}}},{"key" : "苹果","doc_count" : 1,"tag_top" : {"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "my_index","_type" : "_doc","_id" : "3","_score" : null,"_source" : {"tag" : "苹果","price" : "9.00"},"sort" : [9.0]}]}}}]}}
}

这篇关于Elasticsearch-Metrics Aggregations(度量聚合/指标聚合)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/330489

相关文章

基于MySQL Binlog的Elasticsearch数据同步实践

一、为什么要做 随着马蜂窝的逐渐发展,我们的业务数据越来越多,单纯使用 MySQL 已经不能满足我们的数据查询需求,例如对于商品、订单等数据的多维度检索。 使用 Elasticsearch 存储业务数据可以很好的解决我们业务中的搜索需求。而数据进行异构存储后,随之而来的就是数据同步的问题。 二、现有方法及问题 对于数据同步,我们目前的解决方案是建立数据中间表。把需要检索的业务数据,统一放到一张M

Jenkins构建Maven聚合工程,指定构建子模块

一、设置单独编译构建子模块 配置: 1、Root POM指向父pom.xml 2、Goals and options指定构建模块的参数: mvn -pl project1/project1-son -am clean package 单独构建project1-son项目以及它所依赖的其它项目。 说明: mvn clean package -pl 父级模块名/子模块名 -am参数

图解可观测Metrics, tracing, and logging

最近在看Gophercon大会PPT的时候无意中看到了关于Metrics,Tracing和Logging相关的一篇文章,凑巧这些我基本都接触过,也是去年后半年到现在一直在做和研究的东西。从去年的关于Metrics的goappmonitor,到今年在排查问题时脑洞的基于log全链路(Tracing)追踪系统的设计,正好是对这三个话题的实践。这不禁让我对它们的关系进行思考:Metrics和Loggi

ElasticSearch的DSL查询⑤(ES数据聚合、DSL语法数据聚合、RestClient数据聚合)

目录 一、数据聚合 1.1 DSL实现聚合 1.1.1 Bucket聚合  1.1.2 带条件聚合 1.1.3 Metric聚合 1.1.4 总结 2.1 RestClient实现聚合 2.1.1 Bucket聚合 2.1.2 带条件聚合 2.2.3 Metric聚合 一、数据聚合 聚合(aggregations)可以让我们极其方便的实现对数据的统计、分析、运算。例如:

七、Maven继承和聚合关系、及Maven的仓库及查找顺序

1.继承   2.聚合   3.Maven的仓库及查找顺序

【docker】基于docker-compose 安装elasticsearch + kibana + ik分词器(8.10.4版本)

记录下,使用 docker-compose 安装 Elasticsearch 和 Kibana,并配置 IK 分词器,你可以按照以下步骤进行。此过程适用于 Elasticsearch 和 Kibana 8.10.4 版本。 安装 首先,在你的工作目录下创建一个 docker-compose.yml 文件,用于配置 Elasticsearch 和 Kibana 的服务。 version:

ElasticSearch底层原理简析

1.ElasticSearch简述 ElastiaSearch(以下简称ES)是一个基于Lucene的搜索服务器,它提供了一个分布式多用户能力的全文搜索引擎,支持RESTful web接口。Elasticsearch是用Java开发的,并作为Apache许可条款下的开放源码发布,是当前流行的企业级搜索引擎。ES设计用于云计算中,能够进行实时搜索,支持PB级搜索,具有稳定,可靠,快速,安装使用方便等

风控系统之指标回溯,历史数据重跑

个人博客:无奈何杨(wnhyang) 个人语雀:wnhyang 共享语雀:在线知识共享 Github:wnhyang - Overview 回顾 默认你已经看过之前那篇风控系统指标计算/特征提取分析与实现01,Redis、Zset、模版方法。 其中已经介绍了如何利用redis的zset结构完成指标计算,为了方便这篇文章的介绍,还是在正式开始本篇之前回顾一下。 时间窗口 zset

ElasticSearch 6.1.1 通过Head插件,新建索引,添加文档,及其查询数据

ElasticSearch 6.1.1 通过Head插件,新建索引,添加文档,及其查询; 一、首先启动相关服务: 二、新建一个film索引: 三、建立映射: 1、通过Head插件: POST http://192.168.1.111:9200/film/_mapping/dongzuo/ {"properties": {"title": {"type":

ElasticSearch 6.1.1运用代码添加索引及其添加,修改,删除文档

1、新建一个MAVEN项目:ElasticSearchTest 2、修改pom.xml文件内容: <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.or