04.德国博士练习_04_index_data

2024-08-31 15:48
文章标签 德国 练习 04 data index 博士

本文主要是介绍04.德国博士练习_04_index_data,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

文章目录

    • 1. exercise01: update delete by query
    • 2. exercise02: index template
    • 3. exercise03: alias,reindex,pipeline use

1. exercise01: update delete by query

# ** EXAM OBJECTIVE: INDEXING DATA **
# GOAL: Create, update and delete indices while satisfying a given
# set of requirements
# REQUIRED SETUP:
# (i) a running Elasticsearch cluster with at least one node
# and a Kibana instance,
# (ii) the cluster has no index with name `hamlet`,
# (iii) the cluster has no template that applies to indices
# starting by `hamlet`# Create the index `hamlet-raw` with 1 primary shard and 3 replicas# Add a document to `hamlet-raw`, so that the document (i) has id
# "1", (ii) has default type, (iii) has one field named `line`
# with value "To be, or not to be: that is the question"# Update the document with id "1" by adding a field named
# `line_number` with value "3.1.64"
# Add a new document to `hamlet-raw`, so that the document (i) has
# the id automatically assigned by Elasticsearch, (ii) has
# default type, (iii) has a field named `text_entry` with value
# "Whether tis nobler in the mind to suffer", (iv) has a field
# named `line_number` with value "3.1.66"
# Update the last document by setting the value of `line_number` to
# "3.1.65"
# In one request, update all documents in `hamlet-raw` by adding a
# new field named `speaker` with value "Hamlet"# Update the document with id "1" by renaming the field `line` into
# `text_entry`

题解


PUT hamlet-raw
{"settings": {"number_of_replicas": 3,"number_of_shards": 1}
}PUT hamlet-raw/_doc/1
{"line":"To be, or not to be: that is the question"
}POST hamlet-raw/_update/1
{"doc" : {"line_number" : "3.1.64"}
}GET hamlet-raw/_doc/1POST hamlet-raw/_doc
{"text_entry": "text_entry","line_number": "3.1.66"
}# 根据返回的id进行操作
POST hamlet-raw/_update/2uDDLHYBznFAtuOD6g0k
{"doc":{"line_number": "3.1.65"}
}POST hamlet-raw/_update_by_query
{"script":{"lang":"painless","source":"ctx._source.speaker='Hamlet'"}
}GET hamlet-raw/_search使用ingest pipeline
PUT _ingest/pipeline/rename_field
{"description": "rename field","processors": [{"rename": {"field": "line","target_field": "text_entry"}}]
}
POST hamlet-raw/_update_by_query?pipeline=rename_field
{"query": {"ids": {"values": ["1"]}}
}
GET hamlet-raw/_search也可以用script来处理POST hamlet-raw/_update/2
{"script":{"lang":"painless","source": "ctx._source.text_entry=ctx._source.remove('line')"}
}

第二题

# Create the index `hamlet` and add some documents by running the
# following _bulk commandPUT hamlet/_doc/_bulk
{"index":{"_index":"hamlet","_id":0}}
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos
there?"}
{"index":{"_index":"hamlet","_id":1}}
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay,
answer me: stand, and unfold yourself."}
{"index":{"_index":"hamlet","_id":2}}
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live
the king!"}
{"index":{"_index":"hamlet","_id":3}}
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though
yet of Hamlet our dear brothers death"}
{"index":{"_index":"hamlet","_id":4}}
{"line_number":"1.2.2","speaker":"KING CLAUDIUS","text_entry":"The
memory be green, and that it us befitted"}
{"index":{"_index":"hamlet","_id":5}}
{"line_number":"1.3.1","speaker":"LAERTES","text_entry":"My
necessaries are embarkd: farewell:"}
{"index":{"_index":"hamlet","_id":6}}
{"line_number":"1.3.4","speaker":"LAERTES","text_entry":"But let me
hear from you."}
{"index":{"_index":"hamlet","_id":7}}
{"line_number":"1.3.5","speaker":"OPHELIA","text_entry":"Do you doubt
that?"}
{"index":{"_index":"hamlet","_id":8}}
{"line_number":"1.4.1","speaker":"HAMLET","text_entry":"The air bites
shrewdly; it is very cold."}
{"index":{"_index":"hamlet","_id":9}}
{"line_number":"1.4.2","speaker":"HORATIO","text_entry":"It is a
nipping and an eager air."}
{"index":{"_index":"hamlet","_idd":10}}
{"line_number":"1.4.3","speaker":"HAMLET","text_entry":"What hour
now?"}
{"index":{"_index":"hamlet","_id":11}}
{"line_number":"1.5.2","speaker":"Ghost","text_entry":"Mark me."}
{"index":{"_index":"hamlet","_id":12}}
{"line_number":"1.5.3","speaker":"HAMLET","text_entry":"I will."}# Create a script named `set_is_hamlet` and save it into the cluster
# state. The script (i) adds a field named `is_hamlet` to each
# document, (ii) sets the field to "true" if the document has
# `speaker` equals to "HAMLET", (iii) sets the field to "false"
# otherwise
# Update all documents in `hamlet` by running the `set_is_hamlet`
# scriptPretty convenient the “update_by_query” API, don’t you think? Do you also
know how to use its counterpart for deletion?
# Remove from `hamlet` the documents that have either "KING
# CLAUDIUS" or "LAERTES" as the value of `speaker`

这里需要注意的是先存储script,然后再使用的模式,之前很少这样用。


# 先用这个语法整一下
POST hamlet/_update_by_query
{"script":{"lang":"painless","source":"""if(ctx._source.speaker.equals('HAMLET')){ctx._source.is_hamlet=true;}else{ctx._source.is_hamlet=false;}"""}
}把上面的语句存储一下, search template也是可以这里存储
PUT _scripts/set_is_hamlet
{"script":{"lang":"painless","source":"""if(ctx._source.speaker.equals('HAMLET')){ctx._source.is_hamlet=true;}else{ctx._source.is_hamlet=false;}"""}
}使用存储的script
POST hamlet/_update_by_query
{"script":{"id":"set_is_hamlet"}
}GET hamlet/_search

删除操作


POST hamlet/_delete_by_query
{"query": {"terms": {"speaker.keyword": ["KING CLAUDIUS","LAERTES"]}}
}

2. exercise02: index template

# ** EXAM OBJECTIVE: INDEXING DATA **
# GOAL: Create index templates that satisfy a given set of
# requirements
# REQUIRED SETUP:
# (i) a running Elasticsearch cluster with at least one node
# and a Kibana instance,
# (ii) the cluster has no index with name `hamlet`,
# (iii) the cluster has no template that applies to indices
# starting by `hamlet`# Create the index template `hamlet_template`, so that the template
# (i) matches any index that starts by "hamlet_" or "hamlet-",
# (ii) allocates one primary shard and no replicas for each # matching index
# Create the indices `hamlet2` and `hamlet_test`
# Verify that only `hamlet_test` applies the settings defined in
# `hamlet_template`

template 没有办法进行部分update,update操作和创建操作一样,是直接的全部覆盖。


DELETE hamlet*DELETE _template/hamlet*PUT _template/hamlet_template
{"index_patterns":["hamlet_*","hamlet-*"],"settings":{"number_of_shards":1,"number_of_replicas":0}
}PUT hamlet2
PUT hamlet_testGET _cat/shards/hamlet2?v
GET _cat/shards/hamlet_test?v
# Update `hamlet_template` by defining a mapping for the type
# "_doc", so that (i) the type has three fields, named `speaker`,
# `line_number`, and `text_entry`, (ii) `text_entry` uses an
# "english" analyzer
Updates to an index template are not automatically reflected on the matching
indices that already exist. This is because index templates are only applied
once at index creation time.
# Verify that the updates in `hamlet_template` did not apply to the
# existing indices
# In one request, delete both `hamlet2` and `hamlet_test`

GET _template/hamlet_template
PUT _template/hamlet_template
{"index_patterns" : ["hamlet_*","hamlet-*"],"settings" : {"index" : {"number_of_shards" : "1","number_of_replicas" : "0"}},"mappings": {"properties": {"speaker":{"type":"text"},"line_number":{"type":"text"},"text_entry":{"type":"text","analyzer": "english"}}}
}GET hamlet_test
DELETE hamlet2,hamlet_test
# Create the index `hamlet-1` and add some documents by running the
# following _bulk command
PUT hamlet-1/_doc/_bulk
{"index":{"_index":"hamlet-1","_id":0}}
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos
there?"}
{"index":{"_index":"hamlet-1","_id":1}}
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay,
answer me: stand, and unfold yourself."}
{"index":{"_index":"hamlet-1","_id":2}}
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live
the king!"}
{"index":{"_index":"hamlet-1","_id":3}}
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though
yet of Hamlet our dear brothers death"}# Verify that the mapping of `hamlet-1` is consistent with what defined
in `hamlet_template`# Update `hamlet_template` so as to reject any document having a
# field that is not defined in the mapping
# Verify that you cannot index the following document in `hamlet-1` PUT
hamlet-1/_doc
{"author": "Shakespeare"
}

这里如果想要在update hamlet_template 的时候对hamlet-1生效,只能删掉hamlet-1然后进行重建


PUT hamlet-1/_mapping
{"dynamic":"strict"
}POST hamlet-1/_doc
{"author": "Shakespeare"
}
# Update `hamlet_template` so as to enable dynamic mapping again
# Update `hamlet_template` so as to (i) dynamically map to an
# integer any field that starts by "number_", (ii) dynamically
# map to unanalysed text any string field
# Create the index `hamlet-2` and add a document by running the
# following commandPOST hamlet-2/_doc/4
{"text_entry": "With turbulent and dangerous lunacy?","line_number": "3.1.4","number_act": "3","speaker": "KING CLAUDIUS"
}
# Verify that the mapping of `hamlet-2` is consistent with what
# defined in `hamlet_template`

GET _template/hamlet_template
PUT _template/hamlet_template
{"order": 0,"index_patterns": ["hamlet_*","hamlet-*"],"settings": {"index": {"number_of_shards": "1","number_of_replicas": "0"}},"mappings": {"dynamic": true,"dynamic_templates": [{"longs_as_strings": {"match": "number_*","mapping": {"type": "integer"}}},{"longs_as_strings": {"match_mapping_type": "string","mapping": {"type": "keyword"}}}],"properties": {"line_number": {"type": "text"},"text_entry": {"analyzer": "english","type": "text"},"speaker": {"type": "text"}}},"aliases": {}
}POST hamlet-2/_doc/4
{"text_entry": "With turbulent and dangerous lunacy?","line_number": "3.1.4","number_act": "3","speaker": "KING CLAUDIUS"
}GET hamlet-2/_mapping

3. exercise03: alias,reindex,pipeline use

# ** EXAM OBJECTIVE: INDEXING DATA **
# GOAL: Create an alias, reindex indices, and create data pipelines
# REQUIRED SETUP:
# (i) a running Elasticsearch cluster with at least one node
# and a Kibana instance,
# (ii) the cluster has no index with name `hamlet`,
# (iii) the cluster has no template that applies to indices
# starting by `hamlet`As usual, let’s begin by indexing some data.
# Create the indices `hamlet-1` and `hamlet-2`, each with two
# primary shards and no replicas
# Add some documents to `hamlet-1` by running the following commandPUT  hamlet-1/_doc/_bulk
{"index":{"_index":"hamlet-1","_id":0}}
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos
there?"}
{"index":{"_index":"hamlet-1","_id":1}}
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay,
answer me: stand, and unfold yourself."}
{"index":{"_index":"hamlet-1","_id":2}}
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live
the king!"}
{"index":{"_index":"hamlet-1","_id":3}}
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though
yet of Hamlet our dear brothers death"}# Add some documents to `hamlet-2`
by running the following commandPUT hamlet-2/_doc/_bulk
{"index":{"_index":"hamlet-2","_id":4}}
{"line_number":"2.1.1","speaker":"LORD POLONIUS","text_entry":"Give
him this money and these notes, Reynaldo."}
{"index":{"_index":"hamlet-2","_id":5}}
{"line_number":"2.1.2","speaker":"REYNALDO","text_entry":"I will, my
lord."}
{"index":{"_index":"hamlet-2","_id":6}}
{"line_number":"2.1.3","speaker":"LORD POLONIUS","text_entry":"You
shall do marvellous wisely, good Reynaldo,"}
{"index":{"_index":"hamlet-2","_id":7}}
{"line_number":"2.1.4","speaker":"LORD POLONIUS","text_entry":"Before
you visit him, to make inquire"}
# Create the alias `hamlet` that maps both `hamlet-1` and `hamlet-2`
# Verify that the documents grouped by `hamlet` are 8
By default, if your alias includes more than one index, you cannot index
documents using the alias name. But defaults can be overwritten, if you know
how.
# Configure `hamlet-1` to be the write index of the `hamlet` alias
DELETE hamlet*
PUT hamlet-1
{"settings": {"number_of_shards": 2,"number_of_replicas": 0}
}
PUT hamlet-2
{"settings": {"number_of_shards": 2,"number_of_replicas": 0}
}很久没有写alias相关的了,差点失手。。。冷静的查找文档
# Create the alias `hamlet` that maps both `hamlet-1` and `hamlet-2`
# Verify that the documents grouped by `hamlet` are 8
# Configure `hamlet-1` to be the write index of the `hamlet` aliasPOST /_aliases
{"actions": [{"add": {"index": "hamlet-1","alias": "hamlet","is_write_index": true}},{"add": {"index": "hamlet-2","alias": "hamlet"}}]
}PUT hamlet/_doc/1
{"message":"you want to be stronger"
}
GET hamlet/_count

# Add a document to `hamlet`, so that the document 
# (i) has id "8",
# (ii) has "_doc" type, 
# (iii) has a field `text_entry` with value  "With turbulent and dangerous lunacy?", 
# (iv) has a field  `line_number` with value "3.1.4", 
# (v) has a field `speaker`  with value "KING CLAUDIUS"# Create a script named `control_reindex_batch` and save it into the
# cluster state. The script checks whether a document has the
# field `reindexBatch`, and(i) in the affirmative case, it increments the field value by a script parameter named  `increment`, (ii) otherwise, the script adds the field to the  document setting its value to "1"

多练习这种script需要存储起来的场景。script的api可以参考painless guide部分


PUT _scripts/control_reindex_batch
{"script":{"lang":"painless","source": """if(ctx._source.containsKey('reindexBatch')){ctx._source.reindexBatch+=params.increment;}else{ctx._source.reindexBatch=1;}"""}
}POST hamlet-1/_update_by_query
{"script":{"id":"control_reindex_batch","params":{"increment":3}}
}
GET hamlet-1/_search

# Create the index `hamlet-new` with 2 primary shards and no  replicas
# Reindex `hamlet` into `hamlet-new`, while satisfying the following
# criteria: 
(i) apply the `control_reindex_batch` script with the  `increment` parameter set to "1", 
(ii) reindex using two  parallel slices# In one request, add `hamlet-new` to the alias `hamlet` and delete
# the `hamlet` and `hamlet-2` indices
PUT hamlet-new
{"settings": {"number_of_shards": 2,"number_of_replicas": 0}
}POST _reindex?slices=2
{"source": {"index": "hamlet"},"dest": {"index": "hamlet-new"},"script":{"id":"control_reindex_batch","params": {"increment":1}}
}GET hamlet-new/_searchPOST _aliases
{"actions": [{"add": {"index": "hamlet-new","alias": "hamlet"}},{"remove": {"indices": ["hamlet-1","hamlet-2"],  # 需要注意的是这里多个索引的话json的key为indices,单数的话为index"alias": "hamlet"}}]
}GET hamlet/_search
# Create a pipeline named `split_act_scene_line`. The pipeline
# splits the value of `line_number` using the dots as a
# separator, and stores the split values into three
# new fields named `number_act`, `number_scene`, and
# `number_line`, respectively# Test the pipeline on the following document{"_source": {"line_number": "1.2.3"
}
}
Satisfied with the outcome? Go update your documents, then!
# Update all documents in `hamlet-new` by using the
# `split_act_scene_line` pipeline

结合set processor 和 script processor


POST _ingest/pipeline/_simulate
{"pipeline": {"description": "string split by dot","processors": [{"split": {"field": "line_number","separator": "\\.","target_field":"temp_arry"}},{"script": {"lang": "painless","source": """ctx.number_act=ctx.temp_arry[0];ctx.number_scene=ctx.temp_arry[1];ctx.number_line=ctx.temp_arry[2];
"""}},{"remove": {"field": "temp_arry"}}]},"docs": [{"_source": {"line_number": "1.1.3","text_entry": "Long live the king!","reindexBatch": 2,"speaker": "BERNARDO"}}]
}PUT _ingest/pipeline/split_act_scene_line
{"description": "string split by dot","processors": [{"split": {"field": "line_number","separator": "\\.","target_field": "temp_arry"}},{"script": {"lang": "painless","source": """ctx.number_act=ctx.temp_arry[0];ctx.number_scene=ctx.temp_arry[1];ctx.number_line=ctx.temp_arry[2];
"""}},{"remove": {"field": "temp_arry"}}]
}POST hamlet-new/_update_by_query?pipeline=split_act_scene_lineGET hamlet-new/_search

这篇关于04.德国博士练习_04_index_data的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1124329

相关文章

RabbitMQ练习(AMQP 0-9-1 Overview)

1、What is AMQP 0-9-1 AMQP 0-9-1(高级消息队列协议)是一种网络协议,它允许遵从该协议的客户端(Publisher或者Consumer)应用程序与遵从该协议的消息中间件代理(Broker,如RabbitMQ)进行通信。 AMQP 0-9-1模型的核心概念包括消息发布者(producers/publisher)、消息(messages)、交换机(exchanges)、

论文翻译:arxiv-2024 Benchmark Data Contamination of Large Language Models: A Survey

Benchmark Data Contamination of Large Language Models: A Survey https://arxiv.org/abs/2406.04244 大规模语言模型的基准数据污染:一项综述 文章目录 大规模语言模型的基准数据污染:一项综述摘要1 引言 摘要 大规模语言模型(LLMs),如GPT-4、Claude-3和Gemini的快

取得 Git 仓库 —— Git 学习笔记 04

取得 Git 仓库 —— Git 学习笔记 04 我认为, Git 的学习分为两大块:一是工作区、索引、本地版本库之间的交互;二是本地版本库和远程版本库之间的交互。第一块是基础,第二块是难点。 下面,我们就围绕着第一部分内容来学习,先不考虑远程仓库,只考虑本地仓库。 怎样取得项目的 Git 仓库? 有两种取得 Git 项目仓库的方法。第一种是在本地创建一个新的仓库,第二种是把其他地方的某个

【Rust练习】12.枚举

练习题来自:https://practice-zh.course.rs/compound-types/enum.html 1 // 修复错误enum Number {Zero,One,Two,}enum Number1 {Zero = 0,One,Two,}// C语言风格的枚举定义enum Number2 {Zero = 0.0,One = 1.0,Two = 2.0,}fn m

MySql 事务练习

事务(transaction) -- 事务 transaction-- 事务是一组操作的集合,是一个不可分割的工作单位,事务会将所有的操作作为一个整体一起向系统提交或撤销请求-- 事务的操作要么同时成功,要么同时失败-- MySql的事务默认是自动提交的,当执行一个DML语句,MySql会立即自动隐式提交事务-- 常见案例:银行转账-- 逻辑:A给B转账1000:1.查询

html css jquery选项卡 代码练习小项目

在学习 html 和 css jquery 结合使用的时候 做好是能尝试做一些简单的小功能,来提高自己的 逻辑能力,熟悉代码的编写语法 下面分享一段代码 使用html css jquery选项卡 代码练习 <div class="box"><dl class="tab"><dd class="active">手机</dd><dd>家电</dd><dd>服装</dd><dd>数码</dd><dd

CentOS下mysql数据库data目录迁移

https://my.oschina.net/u/873762/blog/180388        公司新上线一个资讯网站,独立主机,raid5,lamp架构。由于资讯网是面向小行业,初步估计一两年内访问量压力不大,故,在做服务器系统搭建的时候,只是简单分出一个独立的data区作为数据库和网站程序的专区,其他按照linux的默认分区。apache,mysql,php均使用yum安装(也尝试

使用Spring Boot集成Spring Data JPA和单例模式构建库存管理系统

引言 在企业级应用开发中,数据库操作是非常重要的一环。Spring Data JPA提供了一种简化的方式来进行数据库交互,它使得开发者无需编写复杂的JPA代码就可以完成常见的CRUD操作。此外,设计模式如单例模式可以帮助我们更好地管理和控制对象的创建过程,从而提高系统的性能和可维护性。本文将展示如何结合Spring Boot、Spring Data JPA以及单例模式来构建一个基本的库存管理系统

014.Python爬虫系列_解析练习

我 的 个 人 主 页:👉👉 失心疯的个人主页 👈👈 入 门 教 程 推 荐 :👉👉 Python零基础入门教程合集 👈👈 虚 拟 环 境 搭 建 :👉👉 Python项目虚拟环境(超详细讲解) 👈👈 PyQt5 系 列 教 程:👉👉 Python GUI(PyQt5)文章合集 👈👈 Oracle数据库教程:👉👉 Oracle数据库文章合集 👈👈 优

浙大数据结构:04-树7 二叉搜索树的操作集

这道题答案都在PPT上,所以先学会再写的话并不难。 1、BinTree Insert( BinTree BST, ElementType X ) 递归实现,小就进左子树,大就进右子树。 为空就新建结点插入。 BinTree Insert( BinTree BST, ElementType X ){if(!BST){BST=(BinTree)malloc(sizeof(struct TNo