ElasticSearch:Reindex数据迁移使用

本文主要是介绍ElasticSearch:Reindex数据迁移使用，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

一、前言

ES在创建好索引后，mapping的properties属性类型是不能更改的，只能添加。如果说需要修改字段就需要重新建立索引然后把旧数据导到新索引。

二、Reindex

5.X版本后新增_reindex API 。Reindex可以直接在Elasticsearch集群里面对数据进行重建。并且支持跨集群间的数据迁移。

三、实战

1、原索引

比如我现在有这么一个索引:topic，mapping信息如下：

{"settings": {"number_of_shards": 3,"number_of_replicas": 2},"mappings": {"properties": {"update_time": {"type": "date","format": "yyyy-MM-dd HH:mm:ss || yyyy-MM-dd'T'HH:mm:ss.SSS || yyyy-MM-dd || epoch_millis"},"create_time": {"type": "date","format": "yyyy-MM-dd HH:mm:ss || yyyy-MM-dd'T'HH:mm:ss.SSS || yyyy-MM-dd || epoch_millis"},"user_id": {"type": "long"},"is_del": {"type": "boolean"},"location": {"type": "geo_point","ignore_malformed": "true"},"id": {"type": "keyword"},"title": {"type": "keyword"},"content": {"term_vector": "with_positions_offsets","search_analyzer": "ik_smart","type": "text","analyzer": "ik_max_word"},"status": {"type": "short"}}}
}

里面有12条数据，我发现我的userId的类型错了，应该是字符串类型的。我想改一下。

2、创建新的索引

创建新的索引为：topic-new,mapping如下：

PUT http://172.16.1.236:9201/topic-new
{"settings": {"number_of_shards": 3,"number_of_replicas": 0,"refresh_interval": -1},"mappings": {"properties": {"update_time": {"type": "date","format": "yyyy-MM-dd HH:mm:ss || yyyy-MM-dd'T'HH:mm:ss.SSS || yyyy-MM-dd || epoch_millis"},"create_time": {"type": "date","format": "yyyy-MM-dd HH:mm:ss || yyyy-MM-dd'T'HH:mm:ss.SSS || yyyy-MM-dd || epoch_millis"},"user_id": {"type": "keyword"},"is_del": {"type": "boolean"},"location": {"type": "geo_point","ignore_malformed": "true"},"id": {"type": "keyword"},"title": {"type": "keyword"},"content": {"term_vector": "with_positions_offsets","search_analyzer": "ik_smart","type": "text","analyzer": "ik_max_word"},"status": {"type": "short"}}}
}

在上面我修改了userId的字段为keyword类型
并修改了number_of_replicas和refresh_interval。
设置number_of_replicas为0防止我们迁移文档的同时又发送到副本节点，影响性能
设置refresh_interval为-1是限制其刷新。默认是1秒
当我们数据迁移完成再把上面两个值进行修改即可

3、开始迁移

在新索引都更新好了，就可以迁移了

POST http://172.16.1.236:9201/_reindex
{"source": {"index": "topic"},"dest": {"index": "topic-new"}
}// 返回
{"took": 1335,"timed_out": false,"total": 12,"updated": 0,"created": 12,"deleted": 0,"batches": 1,"version_conflicts": 0,"noops": 0,"retries": {"bulk": 0,"search": 0},"throttled_millis": 0,"requests_per_second": -1.0,"throttled_until_millis": 0,"failures": []
}

这时候去看数据，是看不到数据的，因为还要刷新才行。

更新配置

PUT http://172.16.1.236:9201/topic-new/_settings
{"refresh_interval": "1s","number_of_replicas": 1
}

更新副本数和刷新时间，自此数据迁移就完成了，因为之前的索引不用，但是接口都是指向之前的索引，我们就在新索引添加别名即可。

添加别名之前先删除旧索引

DELETE http://172.16.1.236:9201/topic

添加别名

POST http://172.16.1.236:9201/_aliases
{"actions": [{"add": {"index": "topic-new", "alias": "topic"}}]}

获取别名

GET http://172.16.1.236:9201/topic/_alias

移除别名

POST http://172.16.1.236:9201/_aliases
{"actions": [{"remove": {"index": "indexName", "alias": "indexAliasName"}}]}

4、跨集群数据迁移

从其他的远程集群 reindex 数据。

在上面是在相同的集群中进行数据迁移的，如果是不同集群呢？
也是可以的，首先需要设置白名单。（如果是A集群 --> B集群，就需要在B中的elasticsearch.yml 设置A地址为白名单）

设置白名单

在目标集群的elasticsearch.yml配置文件，设置远程集群的白名单，添加如下配置

# reindex.remote.whitelist: A的IP:端口，例如：
reindex.remote.whitelist: 172.16.1.236:9200

reindex

和同集群数据迁移基本一样，就是多了一个设置白名单而已。
设置好索引、number_of_replicas: 0、refresh_interval: -1
在remote中设置远程集群的地址与账号密码（如果配置了的话）。
也可以添加query属性，只查询符号条件的。

POST http://172.16.1.236:9201/_reindex
{"source": {"index": "topic","remote": {"host": "http://172.16.1.236:9200","username": "username","password": "password"},"query": {"match_all": {}}},"dest": {"index": "topic-new"}
}

完成之后记得重新配置number_of_replicas、refresh_interval。

这篇关于ElasticSearch:Reindex数据迁移使用的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！

ElasticSearch:Reindex数据迁移使用

一、前言

二、Reindex

三、实战

1、原索引

2、创建新的索引

3、开始迁移

4、跨集群数据迁移

相关文章

SpringBoot分段处理List集合多线程批量插入数据方式

PHP轻松处理千万行数据的方法详解

Python使用FastAPI实现大文件分片上传与断点续传功能

C#实现千万数据秒级导入的代码

Spring Security简介、使用与最佳实践

springboot中使用okhttp3的小结

Java使用Javassist动态生成HelloWorld类

使用Python批量将.ncm格式的音频文件转换为.mp3格式的实战详解

Java使用jar命令配置服务器端口的完整指南

C#使用Spire.Doc for .NET实现HTML转Word的高效方案