本文主要是介绍你还不会ES的CUD吗?,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
近端时间在搬砖过程中对es进行了操作,但是对es查询文档不熟悉,所以这两周都在研究es,简略看了《Elasticsearch权威指南》,摸摸鱼又是一天。
es是一款基于Lucene的实时分布式搜索和分析引擎,今天咱不聊其应用场景,聊一下es索引增删改。
环境:Centos 7,Elasticsearch6.8.3,jdk8
(最新的es是7版本,7版本需要jdk11以上,所以装了es6.8.3版本。)
下面都将以student索引为例
一、创建索引
PUT http://192.168.197.100:9200/student
{"mapping":{"_doc":{ //“_doc”是类型type,es6中一个索引下只有一个type,不能有其它type"properties":{"id": {"type": "keyword"},"name":{"type":"text","index":"analyzed","analyzer":"standard"},"age":{"type":"integer","fields": {"keyword": {"type": "keyword","ignore_above":256}}},"birthday":{"type":"date"},"gender":{"type":"keyword"},"grade":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"class":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}},"settings":{//主分片数量"number_of_shards" : 1, //分片副本数量"number_of_replicas" : 1}
}
type属性是text和keyword的区别:
(1)text在查询的时候会被分词,用于搜索
(2)keyword在查询的时候不会被分词,用于聚合
index属性是表示字符串以何种方式被索引,有三种值
(1)analyzed:字段可以被模糊匹配,类似于sql中的like
(2)not_analyzed:字段只能精确匹配,类似于sql中的“=”
(3)no:字段不提供搜索
analyzer属性是设置分词器,中文的话一般是ik分词器,也可以自定义分词器。
number_of_shards属性是主分片数量,默认是5,创建之后不能修改
number_of_replicas属性时分片副本数量,默认是1,可以修改
创建成功之后会返回如下json字符串
{ "acknowledged": true, "shards_acknowledged": true, "index": "student"}
创建之后如何查看索引的详细信息呢?
GET http://192.168.197.100:9200/student/_mapping
es6版本,索引之下只能有一个类型,例如上文中的“_doc”。
es跟关系型数据库比较:
二、修改索引
//修改分片副本数量为2
PUT http://192.168.197.100:9200/student/_settings
{"number_of_replicas":2
}
三、删除索引
//删除单个索引
DELETE http://192.168.197.100:9200/student//删除所有索引
DELETE http://192.168.197.100:9200/_all
四、默认分词器standard和ik分词器比较
es默认的分词器是standard,它对英文的分词是以空格分割的,中文则是将一个词分成一个一个的文字,所以其不适合作为中文分词器。
例如:standard对英文的分词
//此api是查看文本分词情况的
POST http://192.168.197.100:9200/_analyze
{"text":"the People's Republic of China","analyzer":"standard"
}
结果如下:
{"tokens": [{"token": "the","start_offset": 0,"end_offset": 3,"type": "<ALPHANUM>","position": 0},{"token": "people's","start_offset": 4,"end_offset": 12,"type": "<ALPHANUM>","position": 1},{"token": "republic","start_offset": 13,"end_offset": 21,"type": "<ALPHANUM>","position": 2},{"token": "of","start_offset": 22,"end_offset": 24,"type": "<ALPHANUM>","position": 3},{"token": "china","start_offset": 25,"end_offset": 30,"type": "<ALPHANUM>","position": 4}]
}
对中文的分词:
POST http://192.168.197.100:9200/_analyze
{"text":"中华人民共和国万岁","analyzer":"standard"
}
结果如下:
{"tokens": [{"token": "中","start_offset": 0,"end_offset": 1,"type": "<IDEOGRAPHIC>","position": 0},{"token": "华","start_offset": 1,"end_offset": 2,"type": "<IDEOGRAPHIC>","position": 1},{"token": "人","start_offset": 2,"end_offset": 3,"type": "<IDEOGRAPHIC>","position": 2},{"token": "民","start_offset": 3,"end_offset": 4,"type": "<IDEOGRAPHIC>","position": 3},{"token": "共","start_offset": 4,"end_offset": 5,"type": "<IDEOGRAPHIC>","position": 4},{"token": "和","start_offset": 5,"end_offset": 6,"type": "<IDEOGRAPHIC>","position": 5},{"token": "国","start_offset": 6,"end_offset": 7,"type": "<IDEOGRAPHIC>","position": 6},{"token": "万","start_offset": 7,"end_offset": 8,"type": "<IDEOGRAPHIC>","position": 7},{"token": "岁","start_offset": 8,"end_offset": 9,"type": "<IDEOGRAPHIC>","position": 8}]
}
ik分词器是支持对中文进行词语分割的,其有两个分词器,分别是ik_smart和ik_max_word。
(1)ik_smart:对中文进行最大粒度的划分,简略划分
例如:
POST http://192.168.197.100:9200/_analyze
{"text":"中华人民共和国万岁","analyzer":"ik_smart"
}
结果如下:
{"tokens": [{"token": "中华人民共和国","start_offset": 0,"end_offset": 7,"type": "CN_WORD","position": 0},{"token": "万岁","start_offset": 7,"end_offset": 9,"type": "CN_WORD","position": 1}]
}
(2)ik_max_word:对中文进行最小粒度的划分,将文本划分尽量多的词语
例如:
POST http://192.168.197.100:9200/_analyze
{"text":"中华人民共和国万岁","analyzer":"ik_max_word"
}
结果如下:
{"tokens": [{"token": "中华人民共和国","start_offset": 0,"end_offset": 7,"type": "CN_WORD","position": 0},{"token": "中华人民","start_offset": 0,"end_offset": 4,"type": "CN_WORD","position": 1},{"token": "中华","start_offset": 0,"end_offset": 2,"type": "CN_WORD","position": 2},{"token": "华人","start_offset": 1,"end_offset": 3,"type": "CN_WORD","position": 3},{"token": "人民共和国","start_offset": 2,"end_offset": 7,"type": "CN_WORD","position": 4},{"token": "人民","start_offset": 2,"end_offset": 4,"type": "CN_WORD","position": 5},{"token": "共和国","start_offset": 4,"end_offset": 7,"type": "CN_WORD","position": 6},{"token": "共和","start_offset": 4,"end_offset": 6,"type": "CN_WORD","position": 7},{"token": "国","start_offset": 6,"end_offset": 7,"type": "CN_CHAR","position": 8},{"token": "万岁","start_offset": 7,"end_offset": 9,"type": "CN_WORD","position": 9},{"token": "万","start_offset": 7,"end_offset": 8,"type": "TYPE_CNUM","position": 10},{"token": "岁","start_offset": 8,"end_offset": 9,"type": "COUNT","position": 11}]
}
ik分词器对英文的分词:
POST http://192.168.197.100:9200/_analyze
{"text":"the People's Republic of China","analyzer":"ik_smart"
}
结果如下:会将不重要的词去掉,但standard分词器会保留(英语水平已经退化到a an the都不知道是属于什么类型的词了,身为中国人,这个不能骄傲)
{"tokens": [{"token": "people","start_offset": 4,"end_offset": 10,"type": "ENGLISH","position": 0},{"token": "s","start_offset": 11,"end_offset": 12,"type": "ENGLISH","position": 1},{"token": "republic","start_offset": 13,"end_offset": 21,"type": "ENGLISH","position": 2},{"token": "china","start_offset": 25,"end_offset": 30,"type": "ENGLISH","position": 3}]
}
五、添加文档
可以任意添加字段
//1是“_id”的值,唯一的,也可以随机生成
POST http://192.168.197.100:9200/student/_doc/1
{"id":1,"name":"tom","age":20,"gender":"male","grade":"7","class":"1"
}
六、更新文档
POST http://192.168.197.100:9200/student/_doc/1/_update
{"doc":{"name":"jack"}
}
七、删除文档
//1是“_id”的值
DELETE http://192.168.197.100:9200/student/_doc/1
上述就是简略的对es进行索引创建,修改,删除,文档添加,删除,修改等操作,为避免篇幅太长,文档查询操作将在下篇进行更新。
这篇关于你还不会ES的CUD吗?的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!