【ElasticSearch】(六)浅析Scroll

2024-08-26 20:58
文章标签 elasticsearch 浅析 scroll

本文主要是介绍【ElasticSearch】(六)浅析Scroll,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

【起因】 

      正常查某索引下全部数据的dsl举例如下:

POST /fcar_city/city/_search?scroll=10m
{"query": {"bool": {"must": [{"match_all": { }}]}}
}

       我的意图是把该索引下的全部数据查询出来,上述代码查询结果如下:

{"_shards": {"total": 5,"failed": 0,"successful": 5},"hits": {"hits": [{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "扬州","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "60","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "扬州","t_b_city|en_name": "yz"},"_id": "60","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "通化","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "44","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "通化","t_b_city|en_name": "th"},"_id": "44","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|modify_time": "2016-10-09 08:40:00","t_b_city|center_lat": "28.656386","t_b_city|is_business": "1","t_b_city|modify_emp": "253","t_b_city|name": "台州","t_b_city|en_name": "tz","t_b_city|administrative_name": "台州","t_b_city|id": "48","t_b_city|operate_range": "2","t_b_city|channel_status": "2","t_b_city|status": "2","t_b_city|center_lon": "121.420757"},"_id": "48","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "咸阳","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "52","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "咸阳","t_b_city|en_name": "xiy"},"_id": "52","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "烟台","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "29","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "烟台","t_b_city|en_name": "yt"},"_id": "29","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "晋城","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "40","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "晋城","t_b_city|en_name": "jc"},"_id": "40","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "聊城","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "41","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "聊城","t_b_city|en_name": "lc"},"_id": "41","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "柳州","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "22","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "柳州","t_b_city|en_name": "lz"},"_id": "22","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "萍乡","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "24","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "萍乡","t_b_city|en_name": "px"},"_id": "24","_score": 1},{"_index": "fcar_city","_type": "city","_source": {"t_b_city|administrative_name": "随州","t_b_city|create_emp": "1","t_b_city|create_time": "2016-06-28 11:59:58","t_b_city|id": "25","t_b_city|modify_time": "2016-06-28 11:59:58","t_b_city|operate_range": "1","t_b_city|channel_status": "2","t_b_city|is_business": "1","t_b_city|modify_emp": "1","t_b_city|name": "随州","t_b_city|en_name": "sz"},"_id": "25","_score": 1}],"total": 152,"max_score": 1},"took": 3,"timed_out": false
}

      不难发现,tota显示l一共152条,但是默认只查了10条,这就是我前几天遇到的一个问题。

      鉴于上一篇博客,我尝试通过使用from,size搭配,改写了dsl,如下:

POST /fcar_city/city/_search
{"query": {"bool": {"must": [{"match_all": { }}]}},"from": 0,"size": 1000
}

   

    可见,此时已经查出来全部的152条记录,但是通过from,size查询,就像我上一篇博客所说,可能会耗费性能较大,而且导致“Result window is too large”的问题,之后通过查询官方网站,scroll走进我的视线里。

 

【Scroll】

      es官方对scroll特性介绍的第一句话是这样:

A scroll query is used to retrieve large numbers of documents from Elasticsearch efficiently, without paying the penalty of deep pagination.

      即scroll适用于大量数据的查询,而且无需担心深度分页带来的问题。

      基本写法如下:

GET /old_index/_search?scroll=1m 
{"query": { "match_all": {}},"sort" : ["_doc"], "size":  1000
}

     注意2点:

    (1)scroll=1m,代表scroll开启时间为1分钟;

    (2)“_doc”是最有效的排序手段。

     当在“_search”之后使用了“scroll”,即使“size”设置的很大,也不会出现“Result window is too large”问题,亲测。而且对cup占用过大对问题也没有出现,原因就在于scroll的原理上。其中的奥妙就在这2段介绍中:

Scrolling allows us to do an initial search and to keep pulling batches of results from Elasticsearch until there are no more results left. It’s a bit like a cursor in a traditional database.A scrolled search takes a snapshot in time. It doesn’t see any changes that are made to the index after the initial search request has been made. It does this by keeping the old data files around, so that it can preserve its “view” on what the index looked like at the time it started.

      可见,scroll所查询的,正式某一个时刻的“snapshot”,类似于视图,所以说,对于实时性要求特别高的场景,不适合适用scroll,l列表查询的话,通过from,size也是OK的。查询“字典表”的所有数据,适用scroll就很有必要。

       同时要滚动查看结果,我们执行搜索请求并将scroll值设置为我们要保持滚动窗口打开的时间长度。每次运行滚动请求时都会刷新滚动到期时间,因此只需要足够长的时间来处理当前批次的结果,而不是所有与查询匹配的文档。超时非常重要,因为保持滚动窗口打开会消耗资源,我们希望在不再需要它们时立即释放它们。设置超时使Elasticsearch能够在一段时间不活动后自动释放资源。

     so,that's all. 后续分享java代码对scroll的封装。

 

这篇关于【ElasticSearch】(六)浅析Scroll的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1109666

相关文章

浅析Java中如何优雅地处理null值

《浅析Java中如何优雅地处理null值》这篇文章主要为大家详细介绍了如何结合Lambda表达式和Optional,让Java更优雅地处理null值,感兴趣的小伙伴可以跟随小编一起学习一下... 目录场景 1:不为 null 则执行场景 2:不为 null 则返回,为 null 则返回特定值或抛出异常场景

Elasticsearch 在 Java 中的使用教程

《Elasticsearch在Java中的使用教程》Elasticsearch是一个分布式搜索和分析引擎,基于ApacheLucene构建,能够实现实时数据的存储、搜索、和分析,它广泛应用于全文... 目录1. Elasticsearch 简介2. 环境准备2.1 安装 Elasticsearch2.2 J

浅析CSS 中z - index属性的作用及在什么情况下会失效

《浅析CSS中z-index属性的作用及在什么情况下会失效》z-index属性用于控制元素的堆叠顺序,值越大,元素越显示在上层,它需要元素具有定位属性(如relative、absolute、fi... 目录1. z-index 属性的作用2. z-index 失效的情况2.1 元素没有定位属性2.2 元素处

浅析Python中的绝对导入与相对导入

《浅析Python中的绝对导入与相对导入》这篇文章主要为大家详细介绍了Python中的绝对导入与相对导入的相关知识,文中的示例代码讲解详细,感兴趣的小伙伴可以跟随小编一起学习一下... 目录1 Imports快速介绍2 import语句的语法2.1 基本使用2.2 导入声明的样式3 绝对import和相对i

ElasticSearch+Kibana通过Docker部署到Linux服务器中操作方法

《ElasticSearch+Kibana通过Docker部署到Linux服务器中操作方法》本文介绍了Elasticsearch的基本概念,包括文档和字段、索引和映射,还详细描述了如何通过Docker... 目录1、ElasticSearch概念2、ElasticSearch、Kibana和IK分词器部署

浅析如何使用Swagger生成带权限控制的API文档

《浅析如何使用Swagger生成带权限控制的API文档》当涉及到权限控制时,如何生成既安全又详细的API文档就成了一个关键问题,所以这篇文章小编就来和大家好好聊聊如何用Swagger来生成带有... 目录准备工作配置 Swagger权限控制给 API 加上权限注解查看文档注意事项在咱们的开发工作里,API

Java实现Elasticsearch查询当前索引全部数据的完整代码

《Java实现Elasticsearch查询当前索引全部数据的完整代码》:本文主要介绍如何在Java中实现查询Elasticsearch索引中指定条件下的全部数据,通过设置滚动查询参数(scrol... 目录需求背景通常情况Java 实现查询 Elasticsearch 全部数据写在最后需求背景通常情况下

浅析Rust多线程中如何安全的使用变量

《浅析Rust多线程中如何安全的使用变量》这篇文章主要为大家详细介绍了Rust如何在线程的闭包中安全的使用变量,包括共享变量和修改变量,文中的示例代码讲解详细,有需要的小伙伴可以参考下... 目录1. 向线程传递变量2. 多线程共享变量引用3. 多线程中修改变量4. 总结在Rust语言中,一个既引人入胜又可

Java操作ElasticSearch的实例详解

《Java操作ElasticSearch的实例详解》Elasticsearch是一个分布式的搜索和分析引擎,广泛用于全文搜索、日志分析等场景,本文将介绍如何在Java应用中使用Elastics... 目录简介环境准备1. 安装 Elasticsearch2. 添加依赖连接 Elasticsearch1. 创

浅析Spring Security认证过程

类图 为了方便理解Spring Security认证流程,特意画了如下的类图,包含相关的核心认证类 概述 核心验证器 AuthenticationManager 该对象提供了认证方法的入口,接收一个Authentiaton对象作为参数; public interface AuthenticationManager {Authentication authenticate(Authenti