本文主要是介绍Elasticsearch安装IK分词器(本博第一篇),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
转自:Elasticsearch安装IK分词器
安装方式大概分为两种:
- 自行编译;
- 使用elasticsearch-rtf版,利用别人编译好的文件进行安装;
环境说明:
我下载的es版本是:elasticsearch-1.7.4.tar.gz,这里我是用的ik版本是elasticsearch-rtf-1.0.0.zip解压后得到的elasticsearch-analysis-ik-1.2.6.jar文件。
方式一:自行编译
自行编译的步骤如下:
1.到https://github.com/medcl/elasticsearch-analysis-ik页面下载elasticsearch-analysis-ik-x.x.x.zip压缩包;
2.解压elasticsearch-analysis-ik-x.x.x.zip,然后进行elasticsearch-analysis-ik-x.x.x目录;
3.使用maven进行打包,得到elasticsearch-analysis-ik-x.x.x.jar(然而我并不会使用maven进行打包,所以我没用这种方式);
4.进行elasticsearch-1.7.4/plugins目录,创建目录analysis-ik,并把你编译好的elasticsearch-analysis-ik-x.x.x.jar放置到此目录;
5.将你下载的elasticsearch-analysis-ik-x.x.x.zip解压后的config目录下的ik目录复制到elasticsearch-1.7.4/config目录;
6.配置elasticsearch-1.7.4/config目录下的elasticsearch.yml文件,在文件尾部加入如下代码:
index: analysis: analyzer: ik: alias: [ik_analyzer] type: org.elasticsearch.index.analysis.IkAnalyzerProvider ik_max_word: type: ik use_smart: false ik_smart: type: ik use_smart: true
或简单配置:
index.analysis.analyzer.ik.type : "ik"
7.重新启动es;
方式二:使用elasticsearch-rtf版,利用别人编译好的文件进行安装
我使用的就是这种方式,第一种方式我并没有进行测试,所以暂时不保证正确,着重说明一下第二种方式,步骤如下:
1.到这里https://github.com/medcl/elasticsearch-rtf/releases下载rtf版的es,我下载的是elasticsearch-rtf-1.0.0.zip;
2.解压elasticsearch-rtf-1.0.0.zip压缩包;
3.将elasticsearch-rtf-1.0.0/plugins/analysis-ik目录,复制到elasticsearch-1.7.4/plugins得到elasticsearch-1.7.4/plugins/analysis-ik;
4.将elasticsearch-rtf-1.0.0/config/ik复制到elasticsearch-1.7.4/config/得到elasticsearch-1.7.4/config/ik;
5.编辑elasticsearch-1.7.4/config/elasticsearch.yml文件,在文件尾部加入以下内容:
index: analysis: analyzer: ik: alias: [ik_analyzer] type: org.elasticsearch.index.analysis.IkAnalyzerProvider ik_max_word: type: ik use_smart: false ik_smart: type: ik use_smart: true
或简单配置:
index.analysis.analyzer.ik.type : "ik"
6.重新启动es;
测试(我在第二种安装方式下进行的测试)
创建索引:
curl -XPUT http://localhost:9200/index
创建映射:
curl -XPOST http://localhost:9200/index/fulltext/_mapping -d' {"fulltext": {"_all": {"analyzer": "ik_max_word","search_analyzer": "ik_max_word","term_vector": "no","store": "false"},"properties": {"content": {"type": "string","store": "no","term_vector": "with_positions_offsets","analyzer": "ik_max_word","search_analyzer": "ik_max_word","include_in_all": "true","boost": 8}}} }'
为索引添加一些内容:
curl -XPOST http://localhost:9200/index/fulltext/1 -d' {"content":"美国留给伊拉克的是个烂摊子吗"} ' curl -XPOST http://localhost:9200/index/fulltext/2 -d' {"content":"公安部:各地校车将享最高路权"} ' curl -XPOST http://localhost:9200/index/fulltext/3 -d' {"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"} ' curl -XPOST http://localhost:9200/index/fulltext/4 -d' {"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"} '
进行高亮查询:
curl -XPOST http://localhost:9200/index/fulltext/_search -d' {"query" : { "term" : { "content" : "中国" }},"highlight" : {"pre_tags" : ["<tag1>", "<tag2>"],"post_tags" : ["</tag1>", "</tag2>"],"fields" : {"content" : {}}} } '
查询结果:
{"took":31,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.61370564,"hits":[{"_index":"index","_type":"fulltext","_id":"4","_score":0.61370564,"_source": {"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"} ,"highlight":{"content":["<tag1>中国</tag1>驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"]}},{"_index":"index","_type":"fulltext","_id":"3","_score":0.61370564,"_source": {"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"} ,"highlight":{"content":["中韩渔警冲突调查:韩警平均每天扣1艘<tag1>中国</tag1>渔船"]}}]}}
或者你可以直接在浏览器地址栏进行测试:http://localhost:9200/index/_analyze?analyzer=ik&pretty=true&text=%E6%88%91%E6%98%AF%E4%B8%AD%E5%9B%BD%E4%BA%BA
注意:如果你的版本不对应,可能会出现如下错误:
{"error":"IndexCreationException[[index] failed to create index]; nested: ElasticsearchIllegalArgumentException[failed to find analyzer type [ik] or tokenizer for [ik_max_word]]; nested: NoClassSettingsException[Failed to load class setting [type] with value [ik]]; nested: ClassNotFoundException[org.elasticsearch.index.analysis.ik.IkAnalyzerProvider]; ","status":400}
参考文章:
1.http://samchu.logdown.com/posts/277928-elasticsearch-chinese-word-segmentation;
2.https://github.com/medcl/elasticsearch-analysis-ik;
这篇关于Elasticsearch安装IK分词器(本博第一篇)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!