【ElasticSearch】(六)浅析Scroll

2024-08-26 20:58
文章标签 elasticsearch 浅析 scroll

本文主要是介绍【ElasticSearch】(六)浅析Scroll,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

【起因】 

      正常查某索引下全部数据的dsl举例如下:

POST /fcar_city/city/_search?scroll=10m
{"query": {"bool": {"must": [{"match_all": { }}]}}
}

       我的意图是把该索引下的全部数据查询出来,上述代码查询结果如下:

{"_shards": {"total": 5,"failed": 0,"successful": 5},"hits": {"hits": [{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "扬州","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "60","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "扬州","t_b_city|en_name": "yz"},"_id": "60","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "通化","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "44","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "通化","t_b_city|en_name": "th"},"_id": "44","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|modify_time": "2016-10-09 08:40:00","t_b_city|center_lat": "28.656386","t_b_city|is_business": "1","t_b_city|modify_emp": "253","t_b_city|name": "台州","t_b_city|en_name": "tz","t_b_city|administrative_name": "台州","t_b_city|id": "48","t_b_city|operate_range": "2","t_b_city|channel_status": "2","t_b_city|status": "2","t_b_city|center_lon": "121.420757"},"_id": "48","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "咸阳","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "52","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "咸阳","t_b_city|en_name": "xiy"},"_id": "52","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "烟台","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "29","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "烟台","t_b_city|en_name": "yt"},"_id": "29","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "晋城","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "40","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "晋城","t_b_city|en_name": "jc"},"_id": "40","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "聊城","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "41","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "聊城","t_b_city|en_name": "lc"},"_id": "41","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "柳州","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "22","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "柳州","t_b_city|en_name": "lz"},"_id": "22","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "萍乡","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "24","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "萍乡","t_b_city|en_name": "px"},"_id": "24","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "随州","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "25","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "随州","t_b_city|en_name": "sz"},"_id": "25","_score": 1}],"total": 152,"max_score": 1},"took": 3,"timed_out": false
}

      不难发现,tota显示l一共152条,但是默认只查了10条,这就是我前几天遇到的一个问题。

      鉴于上一篇博客,我尝试通过使用from,size搭配,改写了dsl,如下:

POST /fcar_city/city/_search
{"query": {"bool": {"must": [{"match_all": { }}]}},"from": 0,"size": 1000
}

   

    可见,此时已经查出来全部的152条记录,但是通过from,size查询,就像我上一篇博客所说,可能会耗费性能较大,而且导致“Result window is too large”的问题,之后通过查询官方网站,scroll走进我的视线里。

 

【Scroll】

      es官方对scroll特性介绍的第一句话是这样:

A scroll query is used to retrieve large numbers of documents from Elasticsearch efficiently, without paying the penalty of deep pagination.

      即scroll适用于大量数据的查询,而且无需担心深度分页带来的问题。

      基本写法如下:

GET /old_index/_search?scroll=1m 
{"query": { "match_all": {}},"sort" : ["_doc"], "size":  1000
}

     注意2点:

    (1)scroll=1m,代表scroll开启时间为1分钟;

    (2)“_doc”是最有效的排序手段。

     当在“_search”之后使用了“scroll”,即使“size”设置的很大,也不会出现“Result window is too large”问题,亲测。而且对cup占用过大对问题也没有出现,原因就在于scroll的原理上。其中的奥妙就在这2段介绍中:

Scrolling allows us to do an initial search and to keep pulling batches of results from Elasticsearch until there are no more results left. It’s a bit like a cursor in a traditional database.A scrolled search takes a snapshot in time. It doesn’t see any changes that are made to the index after the initial search request has been made. It does this by keeping the old data files around, so that it can preserve its “view” on what the index looked like at the time it started.

      可见,scroll所查询的,正式某一个时刻的“snapshot”,类似于视图,所以说,对于实时性要求特别高的场景,不适合适用scroll,l列表查询的话,通过from,size也是OK的。查询“字典表”的所有数据,适用scroll就很有必要。

       同时要滚动查看结果,我们执行搜索请求并将scroll值设置为我们要保持滚动窗口打开的时间长度。每次运行滚动请求时都会刷新滚动到期时间,因此只需要足够长的时间来处理当前批次的结果,而不是所有与查询匹配的文档。超时非常重要,因为保持滚动窗口打开会消耗资源,我们希望在不再需要它们时立即释放它们。设置超时使Elasticsearch能够在一段时间不活动后自动释放资源。

     so,that's all. 后续分享java代码对scroll的封装。

 

这篇关于【ElasticSearch】(六)浅析Scroll的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1109666

相关文章

ElasticSearch+Kibana通过Docker部署到Linux服务器中操作方法

《ElasticSearch+Kibana通过Docker部署到Linux服务器中操作方法》本文介绍了Elasticsearch的基本概念,包括文档和字段、索引和映射,还详细描述了如何通过Docker... 目录1、ElasticSearch概念2、ElasticSearch、Kibana和IK分词器部署

浅析如何使用Swagger生成带权限控制的API文档

《浅析如何使用Swagger生成带权限控制的API文档》当涉及到权限控制时,如何生成既安全又详细的API文档就成了一个关键问题,所以这篇文章小编就来和大家好好聊聊如何用Swagger来生成带有... 目录准备工作配置 Swagger权限控制给 API 加上权限注解查看文档注意事项在咱们的开发工作里,API

Java实现Elasticsearch查询当前索引全部数据的完整代码

《Java实现Elasticsearch查询当前索引全部数据的完整代码》:本文主要介绍如何在Java中实现查询Elasticsearch索引中指定条件下的全部数据,通过设置滚动查询参数(scrol... 目录需求背景通常情况Java 实现查询 Elasticsearch 全部数据写在最后需求背景通常情况下

浅析Rust多线程中如何安全的使用变量

《浅析Rust多线程中如何安全的使用变量》这篇文章主要为大家详细介绍了Rust如何在线程的闭包中安全的使用变量,包括共享变量和修改变量,文中的示例代码讲解详细,有需要的小伙伴可以参考下... 目录1. 向线程传递变量2. 多线程共享变量引用3. 多线程中修改变量4. 总结在Rust语言中,一个既引人入胜又可

Java操作ElasticSearch的实例详解

《Java操作ElasticSearch的实例详解》Elasticsearch是一个分布式的搜索和分析引擎,广泛用于全文搜索、日志分析等场景,本文将介绍如何在Java应用中使用Elastics... 目录简介环境准备1. 安装 Elasticsearch2. 添加依赖连接 Elasticsearch1. 创

浅析Spring Security认证过程

类图 为了方便理解Spring Security认证流程,特意画了如下的类图,包含相关的核心认证类 概述 核心验证器 AuthenticationManager 该对象提供了认证方法的入口,接收一个Authentiaton对象作为参数; public interface AuthenticationManager {Authentication authenticate(Authenti

基于MySQL Binlog的Elasticsearch数据同步实践

一、为什么要做 随着马蜂窝的逐渐发展,我们的业务数据越来越多,单纯使用 MySQL 已经不能满足我们的数据查询需求,例如对于商品、订单等数据的多维度检索。 使用 Elasticsearch 存储业务数据可以很好的解决我们业务中的搜索需求。而数据进行异构存储后,随之而来的就是数据同步的问题。 二、现有方法及问题 对于数据同步,我们目前的解决方案是建立数据中间表。把需要检索的业务数据,统一放到一张M

(入门篇)JavaScript 网页设计案例浅析-简单的交互式图片轮播

网页设计已经成为了每个前端开发者的必备技能,而 JavaScript 作为前端三大基础之一,更是为网页赋予了互动性和动态效果。本篇文章将通过一个简单的 JavaScript 案例,带你了解网页设计中的一些常见技巧和技术原理。今天就说一说一个常见的图片轮播效果。相信大家在各类电商网站、个人博客或者展示页面中,都看到过这种轮播图。它的核心功能是展示多张图片,并且用户可以通过点击按钮,左右切换图片。

ElasticSearch的DSL查询⑤(ES数据聚合、DSL语法数据聚合、RestClient数据聚合)

目录 一、数据聚合 1.1 DSL实现聚合 1.1.1 Bucket聚合  1.1.2 带条件聚合 1.1.3 Metric聚合 1.1.4 总结 2.1 RestClient实现聚合 2.1.1 Bucket聚合 2.1.2 带条件聚合 2.2.3 Metric聚合 一、数据聚合 聚合(aggregations)可以让我们极其方便的实现对数据的统计、分析、运算。例如:

【docker】基于docker-compose 安装elasticsearch + kibana + ik分词器(8.10.4版本)

记录下,使用 docker-compose 安装 Elasticsearch 和 Kibana,并配置 IK 分词器,你可以按照以下步骤进行。此过程适用于 Elasticsearch 和 Kibana 8.10.4 版本。 安装 首先,在你的工作目录下创建一个 docker-compose.yml 文件,用于配置 Elasticsearch 和 Kibana 的服务。 version: