05.德国博士练习_06_mapping_analysis

2024-08-31 15:48

本文主要是介绍05.德国博士练习_06_mapping_analysis,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

文章目录

    • 1. exercise01: mapping multi-fields
    • 2. exercise02: nested and join mapping
    • 3. exercise03: custom analyzer

1. exercise01: mapping multi-fields

# ** EXAM OBJECTIVE: MAPPINGS AND TEXT ANALYSIS **
# GOAL: Create a mapping that satisfies a given set of requirements
# REQUIRED SETUP:
# (i) a running Elasticsearch cluster with at least one node
# and a Kibana instance,
# (ii) the cluster has no index with name `hamlet`,
# (iii) the cluster has no template that applies to indices
# starting by `hamlet`# Create the index `hamlet_1` with one primary shard and no replicas
# Define a mapping for the default type "_doc" of `hamlet_1`, so
# that
# (i) the type has three fields, named `speaker`, `line_number`, and `text_entry`, 
# (ii) `speaker` and  `line_number` are unanalysed strings# Update the mapping of `hamlet_1` by disabling aggregations on
# `line_number`

PUT hamlet_1
{"settings": {"number_of_shards": 1,"number_of_replicas": 0},"mappings": {"properties": {"speaker":{"type": "keyword"},"line_number":{"type":"keyword","doc_values":false},"text_entry":{"type": "text"}}}
}

# Add some documents to `hamlet_1` by running the following _bulk
# command
PUT hamlet-1/_doc/_bulk
{"index":{"_index":"hamlet_1","_id":0}}
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos there?"}
{"index":{"_index":"hamlet_1","_id":1}}
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay, answer me: stand, and unfold yourself."}
{"index":{"_index":"hamlet_1","_id":2}}
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live the king!"}
{"index":{"_index":"hamlet_1","_id":3}}
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though yet of Hamlet our dear brothers death"}
{"index":{"_index":"hamlet_1","_id":4}}
{"line_number":"1.2.2","speaker":"KING CLAUDIUS","text_entry":"The memory be green, and that it us befitted"}# Create the index `hamlet_2` with one primary shard and no replicas
# Copy the mapping of `hamlet_1` into `hamlet_2`, but also define a
# multi-field for `speaker`. The name of such multi-field is
# `tokens` and its data type is the (default) analysed string
# Reindex `hamlet_1` to `hamlet_2`
# Verify that full-text queries on "speaker.tokens" are enabled on
# `hamlet_2` by running the following command: 
GET hamlet_2/_search
{"query": {"match": { "speaker.tokens": "hamlet" }
}}这个查询语句有问题,因为上面的bulk当中没有对应的文档,应该修改为king比较合适
PUT hamlet_2
{"settings": {"number_of_shards": 1,"number_of_replicas": 0},"mappings": {"properties": {"speaker":{"type": "keyword","fields": {"tokens":{"type":"text"}}},"line_number":{"type":"keyword","doc_values":false},"text_entry":{"type": "text"}}}
}POST _reindex
{"source": {"index": "hamlet_1"},"dest": {"index": "hamlet_2"}
}GET hamlet_2/_search
{"query": {"match": {"speaker.tokens": "king"}}
}

2. exercise02: nested and join mapping

# ** EXAM OBJECTIVE: MAPPINGS AND TEXT ANALYSIS **
# GOAL: Model relational data
# REQUIRED SETUP:
# (i) a running Elasticsearch cluster with at least one node
# and a Kibana instance,
# (ii) the cluster has no index with name `hamlet`,
# (iii) the cluster has no template that applies to indices
# starting by `hamlet`# Create the index `hamlet_1` with one primary shard and no replicas
# Add some documents to `hamlet_1` by running the following command
PUT hamlet_1/_doc/_bulk
{"index":{"_index":"hamlet_1","_id":"C0"}}
{"name":"HAMLET","relationship":[{"name":"HORATIO","type":"friend"},{"name":"GERTRUDE","type":"mother"}]}
{"index":{"_index":"hamlet_1","_id":"C1"}}
{"name":"KING CLAUDIUS","relationship":[{"name":"HAMLET","type":"nephew"}]}# Verify that the items of the `relationship` array cannot be
# searched independently - e.g., searching for a friend named
# Gertrude will return 1 hitGET hamlet_1/_search
{"query": {"bool": {"must": [{ "match": { "relationship.name": "gertrude" } },{ "match": { "relationship.type": "friend" } }]
}}}

PUT hamlet_1
{"settings": {"number_of_shards": 1,"number_of_replicas": 0}
}GET hamlet_1/_search
{"query": {"bool": {"must": [{"match": {"relationship.name": "gertrude"}},{"match": {"relationship.type": "friend"}}]}}
}
# Create the index `hamlet_2` with one primary shard and no replicas
# Define a mapping for the default type "_doc" of `hamlet_2`, so
# that the inner objects of the `relationship` field 
(i) can be searched independently, 
(ii) have only unanalyzed fields# Reindex `hamlet_1` to `hamlet_2`
# Verify that the items of the `relationship` array can now be
# searched independently - e.g., searching for a friend named
# Gertrude will return no hits 
GET hamlet_2/_search
{"query": {"nested": {"path": "relationship","query": {"bool": {"must": [{ "match": { "relationship.name": "gertrude" }},{ "match": { "relationship.type": "friend" }}]
}}}}}

PUT hamlet_2
{"settings": {"number_of_shards": 1,"number_of_replicas": 0},"mappings": {"properties": {"relationship":{"type":"nested","properties": {"name":{"type":"keyword"},"type":{"type":"keyword"}}}}}
}POST _reindex
{"source": {"index": "hamlet_1"},"dest": {"index": "hamlet_2"}
}下面的查询就没有返回了
GET hamlet_2/_search
{"query": {"nested": {"path": "relationship","query": {"bool": {"must": [{"match": {"relationship.name": "gertrude"}},{"match": {"relationship.type": "friend"}}]}}}}
}

# Add more documents to `hamlet_2` by running the following command
PUT hamlet_2/_doc/_bulk
{"index":{"_index":"hamlet_2","_id":"L0"}}
{"line_number":"1.4.1","speaker":"HAMLET","text_entry":"The air bites shrewdly; it is very cold."}
{"index":{"_index":"hamlet_2","_id":"L1"}}
{"line_number":"1.4.2","speaker":"HORATIO","text_entry":"It is a nipping and an eager air."}
{"index":{"_index":"hamlet_2","_id":"L2"}}
{"line_number":"1.4.3","speaker":"HAMLET","text_entry":"What hour now?"}# Create the index `hamlet_3` with only one primary shard and no
# replicas
# Copy the mapping of `hamlet_2` into `hamlet_3`, but also add a
# join field to define a relation between a `character` (the 
# parent) and a `line` (the child). The name of such field is
# "character_or_line"
# Reindex `hamlet_2` to `hamlet_3`# Create a script named `init_lines` and save it into the cluster
# state. The script
# (i) has a parameter named `characterId`,
# (ii) adds the field `character_or_line` to the document,
# (iii) sets the value of `character_or_line.name` to "line" ,
# (iv) sets the value of `character_or_line.parent` to the value
# of the `characterId` parameter# Update the document with id `C0`
(i.e., the character document of
# Hamlet) by adding the field `character_or_line` and setting its
# `character_or_line.name` value to "character"
# Update the documents in `hamlet_3` that have "HAMLET" as a
# `speaker`, by running the `init_lines` script with
# `characterId` set to "C0"# Verify the success of the previous operation using the query belowGET
hamlet_3/_search
{"query": {"has_parent": {"parent_type": "character","query": {"match": { "name": "HAMLET" }}
}}}

PUT hamlet_3
{"settings": {"number_of_shards": 1,"number_of_replicas": 0},"mappings": {"properties": {"relationship":{"type":"nested","properties": {"name":{"type":"keyword"},"type":{"type":"keyword"}}},"character_or_line":{"type":"join","relations":{"character":"line"}}}}
}POST _reindex?
{"source": {"index": "hamlet_2"},"dest": {"index": "hamlet_3"}
}}

这一题整半天不行,感觉是routing有点问题


POST hamlet_3/_update/C0
{"doc": {"character_or_line": {"name": "character"}}
}PUT _scripts/init_lines
{"script":{"lang": "painless","source": """ctx._source.character_or_line= new HashMap();ctx._source.character_or_line.name='line';ctx._source.character_or_line.parent=params.characterId;"""}
}GET hamlet_3/_search
{"query": {"match": {"speaker": "HAMLET"}}
}POST hamlet_3/_update_by_query?routing=C0
{"query": {"match": {"speaker": "HAMLET"}},"script": {"id":"init_lines","params":{"characterId":"C0"}}
}

解决方案是把上面reindex的使用的设置routing


POST _reindex?
{"source": {"index": "hamlet_2"},"dest": {"index": "hamlet_3","routing": "=C0"}
}

3. exercise03: custom analyzer

# ** EXAM OBJECTIVE: MAPPINGS AND TEXT ANALYSIS **
# GOAL: Add built-in text analyzers and specify a custom one
# REQUIRED SETUP:
# (i) a running Elasticsearch cluster with at least one node
# and a Kibana instance,
# (ii) the cluster has no index with name `hamlet`,
# (iii) the cluster has no template that applies to indices
# starting by `hamlet`# Create the index `hamlet_1` with one primary shard and no replicas
# Define a mapping for the default type "_doc" of `hamlet_1`, so
# that 
(i) the type has three fields, named `speaker`, `line_number`, and `text_entry`, 
(ii) `text_entry` is associated with the language "english" analyzer# Add some documents to `hamlet_1` by running the following command
PUT hamlet_1/_doc/_bulk
{"index":{"_index":"hamlet_1","_id":0}}
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos there?"}
{"index":{"_index":"hamlet_1","_id":1}}
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay, answer me: stand, and unfold yourself."}
{"index":{"_index":"hamlet_1","_id":2}}
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live the king!"}
{"index":{"_index":"hamlet_1","_id":3}}
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though yet of Hamlet our dear brothers death"}

PUT hamlet_1
{"settings": {"number_of_shards": 1,"number_of_replicas": 0},"mappings": {"properties": {"speaker": {"type": "text"},"line_number": {"type": "text"},"text_entry": {"type": "text","analyzer": "english"}}}
}

# Create the index `hamlet_2` with one primary shard and no replicas
# Add to `hamlet_2` a custom analyzer named `shy_hamlet_analyzer`,
# consisting of
# (i) a char filter to replace the characters "Hamlet" with "
# [CENSORED]",
# (ii) a tokenizer to split tokens on whitespaces and columns,
# (iii) a token filter to ignore any token with less than 5
# characters # Define a mapping for the default type "_doc" of `hamlet_2`, so
# that 
(i) the type has one field named `text_entry`, 
(ii) `text_entry` is associated with the `shy_hamlet_analyzer`
# created in the previous step# Reindex the `text_entry` field of `hamlet_1` into `hamlet_2`
# Verify that documents have been reindexed to `hamlet_2` as
# expected - e.g., by searching for "censored" into the
# `text_entry` field

感觉说的按照columns是按照中括号的意思,目前还不是很确定


DELETE hamlet_2
PUT hamlet_2
{"settings": {"analysis": {"analyzer": {"shy_hamlet_analyzer": {"type": "custom","char_filter": "map_filter","tokenizer": "split_tokenizer","filter": ["filter_short","lowercase"]}},"char_filter": {"map_filter": {"type": "mapping","mappings": ["Hamlet => [CENSORED]"]}},"tokenizer": {"split_tokenizer": { "type": "pattern","pattern": "[ \\[\\]]+"}},"filter": {"filter_short": { "type": "length","min":5}}}},"mappings": {"properties": {"text_entry":{"type": "text","analyzer": "shy_hamlet_analyzer"}}}}POST _reindex
{"source": {"index": "hamlet_1"},"dest": {"index": "hamlet_2"}
}GET hamlet_2/_analyze
{"text": ["Though yet of Hamlet"],"analyzer": "shy_hamlet_analyzer"
}GET hamlet_2/_search
{"query": {"match": {"text_entry": "censored"}}
}

这篇关于05.德国博士练习_06_mapping_analysis的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1124330

相关文章

06 C++Lambda表达式

lambda表达式的定义 没有显式模版形参的lambda表达式 [捕获] 前属性 (形参列表) 说明符 异常 后属性 尾随类型 约束 {函数体} 有显式模版形参的lambda表达式 [捕获] <模版形参> 模版约束 前属性 (形参列表) 说明符 异常 后属性 尾随类型 约束 {函数体} 含义 捕获:包含零个或者多个捕获符的逗号分隔列表 模板形参:用于泛型lambda提供个模板形参的名

RabbitMQ练习(AMQP 0-9-1 Overview)

1、What is AMQP 0-9-1 AMQP 0-9-1(高级消息队列协议)是一种网络协议,它允许遵从该协议的客户端(Publisher或者Consumer)应用程序与遵从该协议的消息中间件代理(Broker,如RabbitMQ)进行通信。 AMQP 0-9-1模型的核心概念包括消息发布者(producers/publisher)、消息(messages)、交换机(exchanges)、

忽略某些文件 —— Git 学习笔记 05

忽略某些文件 忽略某些文件 通过.gitignore文件其他规则源如何选择规则源参考资料 对于某些文件,我们不希望把它们纳入 Git 的管理,也不希望它们总出现在未跟踪文件列表。通常它们都是些自动生成的文件,比如日志文件、编译过程中创建的临时文件等。 通过.gitignore文件 假设我们要忽略 lib.a 文件,那我们可以在 lib.a 所在目录下创建一个名为 .gi

【Rust练习】12.枚举

练习题来自:https://practice-zh.course.rs/compound-types/enum.html 1 // 修复错误enum Number {Zero,One,Two,}enum Number1 {Zero = 0,One,Two,}// C语言风格的枚举定义enum Number2 {Zero = 0.0,One = 1.0,Two = 2.0,}fn m

MySql 事务练习

事务(transaction) -- 事务 transaction-- 事务是一组操作的集合,是一个不可分割的工作单位,事务会将所有的操作作为一个整体一起向系统提交或撤销请求-- 事务的操作要么同时成功,要么同时失败-- MySql的事务默认是自动提交的,当执行一个DML语句,MySql会立即自动隐式提交事务-- 常见案例:银行转账-- 逻辑:A给B转账1000:1.查询

html css jquery选项卡 代码练习小项目

在学习 html 和 css jquery 结合使用的时候 做好是能尝试做一些简单的小功能,来提高自己的 逻辑能力,熟悉代码的编写语法 下面分享一段代码 使用html css jquery选项卡 代码练习 <div class="box"><dl class="tab"><dd class="active">手机</dd><dd>家电</dd><dd>服装</dd><dd>数码</dd><dd

014.Python爬虫系列_解析练习

我 的 个 人 主 页:👉👉 失心疯的个人主页 👈👈 入 门 教 程 推 荐 :👉👉 Python零基础入门教程合集 👈👈 虚 拟 环 境 搭 建 :👉👉 Python项目虚拟环境(超详细讲解) 👈👈 PyQt5 系 列 教 程:👉👉 Python GUI(PyQt5)文章合集 👈👈 Oracle数据库教程:👉👉 Oracle数据库文章合集 👈👈 优

前端-06-eslint9大变样后,如何生成旧版本的.eslintrc.cjs配置文件

目录 问题解决办法 问题 最近在写一个vue3+ts的项目,看了尚硅谷的视频,到了配置eslintrc.cjs的时候我犯了难,因为eslint从9.0之后重大更新,跟以前完全不一样,但是我还是想用和老师一样的eslintrc.cjs文件,该怎么做呢? 视频链接:尚硅谷Vue项目实战硅谷甄选,vue3项目+TypeScript前端项目一套通关 解决办法 首先 eslint 要

如何快速练习键盘盲打

盲打是指在不看键盘的情况下进行打字,这样可以显著提高打字速度和效率。以下是一些练习盲打的方法: 熟悉键盘布局:首先,你需要熟悉键盘上的字母和符号的位置。可以通过键盘图或者键盘贴纸来帮助记忆。 使用在线打字练习工具:有许多在线的打字练习网站,如Typing.com、10FastFingers等,它们提供了不同难度的练习和测试。 练习基本键位:先从学习手指放在键盘上的“家位”开始,通常是左手的

anaconda3下的python编程练习-csv翻译器

相关理解和命令 一、环境配置1、conda命令2、pip命令3、python命令 二、开发思路三、开发步骤 一、环境配置 1、conda命令 镜像源配置 conda config --show channels //查看镜像源conda config --remove-key channels //删除添加源,恢复默认源#添加镜像源conda config --ad