本文主要是介绍05.德国博士练习_06_mapping_analysis,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
文章目录
- 1. exercise01: mapping multi-fields
- 2. exercise02: nested and join mapping
- 3. exercise03: custom analyzer
1. exercise01: mapping multi-fields
# ** EXAM OBJECTIVE: MAPPINGS AND TEXT ANALYSIS **
# GOAL: Create a mapping that satisfies a given set of requirements
# REQUIRED SETUP:
# (i) a running Elasticsearch cluster with at least one node
# and a Kibana instance,
# (ii) the cluster has no index with name `hamlet`,
# (iii) the cluster has no template that applies to indices
# starting by `hamlet`# Create the index `hamlet_1` with one primary shard and no replicas
# Define a mapping for the default type "_doc" of `hamlet_1`, so
# that
# (i) the type has three fields, named `speaker`, `line_number`, and `text_entry`,
# (ii) `speaker` and `line_number` are unanalysed strings# Update the mapping of `hamlet_1` by disabling aggregations on
# `line_number`
PUT hamlet_1
{"settings": {"number_of_shards": 1,"number_of_replicas": 0},"mappings": {"properties": {"speaker":{"type": "keyword"},"line_number":{"type":"keyword","doc_values":false},"text_entry":{"type": "text"}}}
}
# Add some documents to `hamlet_1` by running the following _bulk
# command
PUT hamlet-1/_doc/_bulk
{"index":{"_index":"hamlet_1","_id":0}}
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos there?"}
{"index":{"_index":"hamlet_1","_id":1}}
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay, answer me: stand, and unfold yourself."}
{"index":{"_index":"hamlet_1","_id":2}}
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live the king!"}
{"index":{"_index":"hamlet_1","_id":3}}
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though yet of Hamlet our dear brothers death"}
{"index":{"_index":"hamlet_1","_id":4}}
{"line_number":"1.2.2","speaker":"KING CLAUDIUS","text_entry":"The memory be green, and that it us befitted"}# Create the index `hamlet_2` with one primary shard and no replicas
# Copy the mapping of `hamlet_1` into `hamlet_2`, but also define a
# multi-field for `speaker`. The name of such multi-field is
# `tokens` and its data type is the (default) analysed string
# Reindex `hamlet_1` to `hamlet_2`
# Verify that full-text queries on "speaker.tokens" are enabled on
# `hamlet_2` by running the following command:
GET hamlet_2/_search
{"query": {"match": { "speaker.tokens": "hamlet" }
}}这个查询语句有问题,因为上面的bulk当中没有对应的文档,应该修改为king比较合适
PUT hamlet_2
{"settings": {"number_of_shards": 1,"number_of_replicas": 0},"mappings": {"properties": {"speaker":{"type": "keyword","fields": {"tokens":{"type":"text"}}},"line_number":{"type":"keyword","doc_values":false},"text_entry":{"type": "text"}}}
}POST _reindex
{"source": {"index": "hamlet_1"},"dest": {"index": "hamlet_2"}
}GET hamlet_2/_search
{"query": {"match": {"speaker.tokens": "king"}}
}
2. exercise02: nested and join mapping
# ** EXAM OBJECTIVE: MAPPINGS AND TEXT ANALYSIS **
# GOAL: Model relational data
# REQUIRED SETUP:
# (i) a running Elasticsearch cluster with at least one node
# and a Kibana instance,
# (ii) the cluster has no index with name `hamlet`,
# (iii) the cluster has no template that applies to indices
# starting by `hamlet`# Create the index `hamlet_1` with one primary shard and no replicas
# Add some documents to `hamlet_1` by running the following command
PUT hamlet_1/_doc/_bulk
{"index":{"_index":"hamlet_1","_id":"C0"}}
{"name":"HAMLET","relationship":[{"name":"HORATIO","type":"friend"},{"name":"GERTRUDE","type":"mother"}]}
{"index":{"_index":"hamlet_1","_id":"C1"}}
{"name":"KING CLAUDIUS","relationship":[{"name":"HAMLET","type":"nephew"}]}# Verify that the items of the `relationship` array cannot be
# searched independently - e.g., searching for a friend named
# Gertrude will return 1 hitGET hamlet_1/_search
{"query": {"bool": {"must": [{ "match": { "relationship.name": "gertrude" } },{ "match": { "relationship.type": "friend" } }]
}}}
PUT hamlet_1
{"settings": {"number_of_shards": 1,"number_of_replicas": 0}
}GET hamlet_1/_search
{"query": {"bool": {"must": [{"match": {"relationship.name": "gertrude"}},{"match": {"relationship.type": "friend"}}]}}
}
# Create the index `hamlet_2` with one primary shard and no replicas
# Define a mapping for the default type "_doc" of `hamlet_2`, so
# that the inner objects of the `relationship` field
(i) can be searched independently,
(ii) have only unanalyzed fields# Reindex `hamlet_1` to `hamlet_2`
# Verify that the items of the `relationship` array can now be
# searched independently - e.g., searching for a friend named
# Gertrude will return no hits
GET hamlet_2/_search
{"query": {"nested": {"path": "relationship","query": {"bool": {"must": [{ "match": { "relationship.name": "gertrude" }},{ "match": { "relationship.type": "friend" }}]
}}}}}
PUT hamlet_2
{"settings": {"number_of_shards": 1,"number_of_replicas": 0},"mappings": {"properties": {"relationship":{"type":"nested","properties": {"name":{"type":"keyword"},"type":{"type":"keyword"}}}}}
}POST _reindex
{"source": {"index": "hamlet_1"},"dest": {"index": "hamlet_2"}
}下面的查询就没有返回了
GET hamlet_2/_search
{"query": {"nested": {"path": "relationship","query": {"bool": {"must": [{"match": {"relationship.name": "gertrude"}},{"match": {"relationship.type": "friend"}}]}}}}
}
# Add more documents to `hamlet_2` by running the following command
PUT hamlet_2/_doc/_bulk
{"index":{"_index":"hamlet_2","_id":"L0"}}
{"line_number":"1.4.1","speaker":"HAMLET","text_entry":"The air bites shrewdly; it is very cold."}
{"index":{"_index":"hamlet_2","_id":"L1"}}
{"line_number":"1.4.2","speaker":"HORATIO","text_entry":"It is a nipping and an eager air."}
{"index":{"_index":"hamlet_2","_id":"L2"}}
{"line_number":"1.4.3","speaker":"HAMLET","text_entry":"What hour now?"}# Create the index `hamlet_3` with only one primary shard and no
# replicas
# Copy the mapping of `hamlet_2` into `hamlet_3`, but also add a
# join field to define a relation between a `character` (the
# parent) and a `line` (the child). The name of such field is
# "character_or_line"
# Reindex `hamlet_2` to `hamlet_3`# Create a script named `init_lines` and save it into the cluster
# state. The script
# (i) has a parameter named `characterId`,
# (ii) adds the field `character_or_line` to the document,
# (iii) sets the value of `character_or_line.name` to "line" ,
# (iv) sets the value of `character_or_line.parent` to the value
# of the `characterId` parameter# Update the document with id `C0`
(i.e., the character document of
# Hamlet) by adding the field `character_or_line` and setting its
# `character_or_line.name` value to "character"
# Update the documents in `hamlet_3` that have "HAMLET" as a
# `speaker`, by running the `init_lines` script with
# `characterId` set to "C0"# Verify the success of the previous operation using the query belowGET
hamlet_3/_search
{"query": {"has_parent": {"parent_type": "character","query": {"match": { "name": "HAMLET" }}
}}}
PUT hamlet_3
{"settings": {"number_of_shards": 1,"number_of_replicas": 0},"mappings": {"properties": {"relationship":{"type":"nested","properties": {"name":{"type":"keyword"},"type":{"type":"keyword"}}},"character_or_line":{"type":"join","relations":{"character":"line"}}}}
}POST _reindex?
{"source": {"index": "hamlet_2"},"dest": {"index": "hamlet_3"}
}}
这一题整半天不行,感觉是routing有点问题
POST hamlet_3/_update/C0
{"doc": {"character_or_line": {"name": "character"}}
}PUT _scripts/init_lines
{"script":{"lang": "painless","source": """ctx._source.character_or_line= new HashMap();ctx._source.character_or_line.name='line';ctx._source.character_or_line.parent=params.characterId;"""}
}GET hamlet_3/_search
{"query": {"match": {"speaker": "HAMLET"}}
}POST hamlet_3/_update_by_query?routing=C0
{"query": {"match": {"speaker": "HAMLET"}},"script": {"id":"init_lines","params":{"characterId":"C0"}}
}
解决方案是把上面reindex的使用的设置routing
POST _reindex?
{"source": {"index": "hamlet_2"},"dest": {"index": "hamlet_3","routing": "=C0"}
}
3. exercise03: custom analyzer
# ** EXAM OBJECTIVE: MAPPINGS AND TEXT ANALYSIS **
# GOAL: Add built-in text analyzers and specify a custom one
# REQUIRED SETUP:
# (i) a running Elasticsearch cluster with at least one node
# and a Kibana instance,
# (ii) the cluster has no index with name `hamlet`,
# (iii) the cluster has no template that applies to indices
# starting by `hamlet`# Create the index `hamlet_1` with one primary shard and no replicas
# Define a mapping for the default type "_doc" of `hamlet_1`, so
# that
(i) the type has three fields, named `speaker`, `line_number`, and `text_entry`,
(ii) `text_entry` is associated with the language "english" analyzer# Add some documents to `hamlet_1` by running the following command
PUT hamlet_1/_doc/_bulk
{"index":{"_index":"hamlet_1","_id":0}}
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos there?"}
{"index":{"_index":"hamlet_1","_id":1}}
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay, answer me: stand, and unfold yourself."}
{"index":{"_index":"hamlet_1","_id":2}}
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live the king!"}
{"index":{"_index":"hamlet_1","_id":3}}
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though yet of Hamlet our dear brothers death"}
PUT hamlet_1
{"settings": {"number_of_shards": 1,"number_of_replicas": 0},"mappings": {"properties": {"speaker": {"type": "text"},"line_number": {"type": "text"},"text_entry": {"type": "text","analyzer": "english"}}}
}
# Create the index `hamlet_2` with one primary shard and no replicas
# Add to `hamlet_2` a custom analyzer named `shy_hamlet_analyzer`,
# consisting of
# (i) a char filter to replace the characters "Hamlet" with "
# [CENSORED]",
# (ii) a tokenizer to split tokens on whitespaces and columns,
# (iii) a token filter to ignore any token with less than 5
# characters # Define a mapping for the default type "_doc" of `hamlet_2`, so
# that
(i) the type has one field named `text_entry`,
(ii) `text_entry` is associated with the `shy_hamlet_analyzer`
# created in the previous step# Reindex the `text_entry` field of `hamlet_1` into `hamlet_2`
# Verify that documents have been reindexed to `hamlet_2` as
# expected - e.g., by searching for "censored" into the
# `text_entry` field
感觉说的按照columns是按照中括号的意思,目前还不是很确定
DELETE hamlet_2
PUT hamlet_2
{"settings": {"analysis": {"analyzer": {"shy_hamlet_analyzer": {"type": "custom","char_filter": "map_filter","tokenizer": "split_tokenizer","filter": ["filter_short","lowercase"]}},"char_filter": {"map_filter": {"type": "mapping","mappings": ["Hamlet => [CENSORED]"]}},"tokenizer": {"split_tokenizer": { "type": "pattern","pattern": "[ \\[\\]]+"}},"filter": {"filter_short": { "type": "length","min":5}}}},"mappings": {"properties": {"text_entry":{"type": "text","analyzer": "shy_hamlet_analyzer"}}}}POST _reindex
{"source": {"index": "hamlet_1"},"dest": {"index": "hamlet_2"}
}GET hamlet_2/_analyze
{"text": ["Though yet of Hamlet"],"analyzer": "shy_hamlet_analyzer"
}GET hamlet_2/_search
{"query": {"match": {"text_entry": "censored"}}
}
这篇关于05.德国博士练习_06_mapping_analysis的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!