本文主要是介绍ElasticSearch7.17.5版本热更新同义词,扩展词停止词,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
ElasticSearch7.17.5热更新同义词及自定义ik分词器,集成SpringData4.3.4使用
- 从ElasticSearch部署同义词热更新踩坑出来,写个博客记录一下
- ik分词器
- 同义词
- SpringData集成
从ElasticSearch部署同义词热更新踩坑出来,写个博客记录一下
ElasticSearch版本更新得很快,对于同义词热更新官方版本只到5.1.1版本,这里是做了一个适用于7.16.3版本以上的插件,测试发现7.17.5也兼容。
ik分词器源码修改是参考以下博客实现:https://blog.csdn.net/zq199419951001/article/details/89884461
修改后的同义词代码和ik分词器源码如下,ik可从官网下载对应的版本,照着文档改写
ik分词器7.17.5源码下载地址:https://download.csdn.net/download/qq_41927845/87254019
7.17.5同义词源码:https://download.csdn.net/download/qq_41927845/87253959
ik分词器
参考代码的实现比较完善,需要修改的地方是config目录增加一个db.properties文件:
然后在Dictionary类下面修改
private final static String FILE_NAME = "IKAnalyzer.cfg.xml";private final static String EXT_DICT = "ext_dict";private final static String REMOTE_EXT_DICT = "remote_ext_dict";private final static String EXT_STOP = "ext_stopwords";private final static String REMOTE_EXT_STOP = "remote_ext_stopwords";private final static String DB_PROPERTIES="db.properties";private Path conf_dir;private Properties props;private Properties myProperties;private Dictionary(Configuration cfg) {this.configuration = cfg;this.props = new Properties();this.myProperties=new Properties();this.conf_dir = cfg.getEnvironment().configFile().resolve(AnalysisIkPlugin.PLUGIN_NAME);Path configFile = conf_dir.resolve(FILE_NAME);Path myFile = cfg.getConfigInPluginDir().resolve(DB_PROPERTIES);InputStream input = null;InputStream myInput = null;File file = myFile.toFile();logger.info("file文件:" + file);try {myInput = new FileInputStream(file);} catch (FileNotFoundException e1) {logger.error("db.properties未找到", e1);}try {logger.info("try load config from {}", configFile);input = new FileInputStream(configFile.toFile());} catch (FileNotFoundException e) {conf_dir = cfg.getConfigInPluginDir();configFile = conf_dir.resolve(FILE_NAME);try {logger.info("try load config from {}", configFile);input = new FileInputStream(configFile.toFile());} catch (FileNotFoundException ex) {// We should report origin exceptionlogger.error("ik-analyzer", e);}}if (input != null) {try {props.loadFromXML(input);} catch (IOException e) {logger.error("ik-analyzer", e);}}try {myProperties.load(myInput);} catch (IOException e) {logger.error("加载db.properties文件失败!", e);}}
增加加载数据库代码并自动加载字典
private String getUrl() {String url = myProperties.getProperty("url");return url;}private String getUser() {String user = myProperties.getProperty("user");return user;}private String getPassword() {String password = myProperties.getProperty("password");return password;}private int getInterval() {Integer interval = Integer.valueOf(myProperties.getProperty("interval"));return interval;}private String getExtWordSql() {String extWordSql = myProperties.getProperty("extWordSql");return extWordSql;}private String getStopWordSql() {String stopWordSql = myProperties.getProperty("stopWordSql");return stopWordSql;}private void loadMySQLExtDict() {Connection conn = null;Statement stmt = null;ResultSet rs = null;try {logger.info("query ext dict from mysql, " + getUrl());Class.forName("com.mysql.cj.jdbc.Driver");conn = DriverManager.getConnection(getUrl(), getUser(), getPassword());stmt = conn.createStatement();String extWordSql = getExtWordSql();if(extWordSql!=null && extWordSql!=""){rs = stmt.executeQuery(extWordSql);while (rs.next()) {String theWord = rs.getString("main_keyword");logger.info("main_keyword ext word from mysql: " + theWord);_MainDict.fillSegment(theWord.trim().toCharArray());}}} catch (Exception e) {logger.error("erorr", e);} finally {if (rs != null) {try {rs.close();} catch (SQLException e) {logger.error("error", e);}}if (stmt != null) {try {stmt.close();} catch (SQLException e) {logger.error("error", e);}}if (conn != null) {try {conn.close();} catch (SQLException e) {logger.error("error", e);}}}}private void loadMySQLStopDict() {Connection conn = null;Statement stmt = null;ResultSet rs = null;try {logger.info("query stop dict from mysql, " + getUrl());Class.forName("com.mysql.cj.jdbc.Driver");conn = DriverManager.getConnection(getUrl(), getUser(), getPassword());stmt = conn.createStatement();String stopWordSql = getStopWordSql();if(stopWordSql!=null && stopWordSql!=""){rs = stmt.executeQuery(stopWordSql);while (rs.next()) {String theWord = rs.getString("main_keyword");logger.info("main_keyword stop word from mysql: " + theWord);_StopWords.fillSegment(theWord.trim().toCharArray());}}} catch (Exception e) {logger.error("erorr", e);} finally {if (rs != null) {try {rs.close();} catch (SQLException e) {logger.error("error", e);}}if (stmt != null) {try {stmt.close();} catch (SQLException e) {logger.error("error", e);}}if (conn != null) {try {conn.close();} catch (SQLException e) {logger.error("error", e);}}}}public void reLoadMySqlDict() {logger.info("重新加载远程词典...");// 新开一个实例加载词典,减少加载过程对当前词典使用的影响Dictionary tmpDict = new Dictionary(configuration);tmpDict.configuration = getSingleton().configuration;tmpDict.loadMainDict();tmpDict.loadStopWordDict();tmpDict.loadMySQLExtDict();tmpDict.loadMySQLStopDict();_MainDict = tmpDict._MainDict;_StopWords = tmpDict._StopWords;logger.info("重新加载远程词典完毕...");}
如果保留默认的ik分词器,则给这个ik分词器重新命名
修改pom.xml文件elasticsearch.plugin.name命名,并且es的版本应与服务器一致
需要在pom文件增加mysql驱动文件
<dependency><groupId>mysql</groupId><artifactId>mysql-connector-java</artifactId><version>8.0.13</version></dependency>
修改plugin.xml
引入mysql驱动
<include>mysql:mysql-connector-java</include>
在AnalysisIkPlugin类修改分析器名称,PLUGIN_NAME需要和pom文件修改的地方保持一致
至此插件修改完毕,选择mvn clear,再compile package打包
将target/release目录下的压缩包复制,在es的安装目录下的plugin文件夹创建一个文件夹命名为ik-custom,粘贴至此处解压,解压完成后删除压缩包文件,注意看是否引入mysql驱动包
同义词
由于官网最新的同义词插件版本为5.1.1,这里是改了一个适用于7.16.3及7.17.5版本的数据
下载后使用idea打开项目
首先还是在源码的config文件夹中配置数据库及查询同义词,同义词更新版本的数据,每次同义词版本大于插件记录的版本就会加载同义词,频率是60秒扫描一次,具体使用可查看README.pd文件
修改pom文件中的es版本
就可以mvn clear,再compile package打包
将target/release目录下的压缩包复制,在es的安装目录下的plugin文件夹创建一个文件夹命名为ik-custom,粘贴至此处解压,解压完成后删除压缩包文件,注意看是否引入mysql驱动包(同以上一致)
打开es的日志,可看到同义词的加载结果
SpringData集成
这里要注意的是,springdata 和es有版本限制
我这边用到的springboot是2.6.8,es是7.17.5,暂时选择的springdata版本是4.3.4,目前还没发现什么问题,最好的是选择4.4.x版本
<dependency><groupId>org.springframework.data</groupId><artifactId>spring-data-elasticsearch</artifactId><version>4.3.4</version></dependency>
配置es信息
elasticsearch:rest:uris: http://localhost:9200username: password: jackson:date-format: yyyy-MM-dd HH:mm:ss
加载同义词分析器:在项目resource目录下创建一个文件夹elasticsearch,建一个es.json文件,这里可以设置分析器用自定义的分词器,如ik_smart_custom
{"index" : {"analysis" : {"analyzer" : {"synonym" : {"tokenizer" : "ik_smart","filter" : ["remote_synonym"]}},"filter" : {"remote_synonym" : {"type" : "dynamic_synonym","synonyms_path" : "fromMySql","interval": 60},"local_synonym" : {"type" : "dynamic_synonym","synonyms_path" : "synonym.txt"},"synonym_graph" : {"type" : "dynamic_synonym_graph","synonyms_path" : "http://host:port/synonym.txt"}}}}
}
创建实体类,指定setting
这里是指定了索引名为dynamic1,需要注意的是参数设置了synonym同义词分析器才能使用同义词查询
创建dao层文件夹,继承ElasticsearchRepository
注意框起来的值必须和实体类的id类型一致
接下来创建controller即可测试
测试结果如下:
新增es文档数据
同义词查询:我这边配置了同义词组为 我爱,高德,书包,方法
可以看到查询效果
这篇关于ElasticSearch7.17.5版本热更新同义词,扩展词停止词的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!