ES相关度评分

2024-06-04 03:38

文章标签 es 评分相关度

本文主要是介绍ES相关度评分，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

算法介绍

relevance score（相关度得分）算法：简单来说，就是计算出，一个索引中的文本，与搜索文本，他们之间的关联匹配程度

Elasticsearch 使用的是 term frequency/inverse document frequency 算法，简称为 TF/IDF 算法

TF/IDF 有以下三个组成

Term frequency（词的频率）

搜索文本中的各个词条在 field 文本中出现了多少次，出现次数越多，就越相关

比如：搜索请求：hello world，肯定是 doc1 中得分高

doc1：hello you, and world is very good doc2：hello, how are you

Inverse document frequency

搜索文本中的各个词条在整个索引的所有文档中出现了多少次，出现的次数越多，就越不相关

比如：搜索请求：hello world ，hello 在 doc 2 中出现了两次，得分就会低

doc1：hello world today is very good doc2：hello hello world is very good

Field-length norm：

field 长度，field 越长，相关度越弱

比如：搜索请求：hello world

doc1：{ “title”: “hello article”, “content”: “babaaba 1万个单词” } doc2：{ “title”: “my article hi world”, “content”: “blablabala 1万个单词” }

hello world 在整个 index 中出现的次数是一样多的

doc1更相关，title field更短

案例

GET /article/_search
{"query": {"multi_match": {"query": " red dog","fields": ["title","content"],"tie_breaker": 0.3}}
}

结果

  "hits" : [{"_index" : "article","_type" : "_doc","_id" : "5","_score" : 1.5956411,"_source" : {"title" : "red dog","content" : "I don't like red dog, bug my gridfrind like it"}},{"_index" : "article","_type" : "_doc","_id" : "3","_score" : 1.0691401,"_source" : {"title" : "a dog","content" : "this is a red dog"}},{"_index" : "article","_type" : "_doc","_id" : "2","_score" : 0.9317024,"_source" : {"title" : "a dog","content" : "a red dog and a blue dog is running"}},{"_index" : "article","_type" : "_doc","_id" : "4","_score" : 0.33297,"_source" : {"title" : "a dog","content" : "this is a stupid dog"}}]

可以看到 title 等于red dog 的这个评分最高，因为title符合tf的规律，red 和dog这两个词都出现了。也符合idf 两个词都仅仅出现一次，也符合tfnorm, title的内容也非常短。因此排名最高。

后面的就可以按照哪一条内容更符合这三条规则来排序的。

tie_break: 默认在多字段查询的时候,es会选择一个得分最高的字段最为这个记录的得分。加上这个tie_break之后就会做一个综合计算。
可以看看下面的这个文章。tie_break作用

bool 下得分情况

GET /article/_search
{"query": {"bool": {"should": [{"match": {"title": "dog"}},{"match": {"title": "bird"}}]}}
}

结果

 "hits" : {"total" : {"value" : 5,"relation" : "eq"},"max_score" : 1.6360589,"hits" : [{"_index" : "article","_type" : "_doc","_id" : "1","_score" : 1.6360589,"_source" : {"title" : " a bird ","content" : " bird can fly "}},{"_index" : "article","_type" : "_doc","_id" : "3","_score" : 0.25613075,"_source" : {"title" : "a dog","content" : "this is a red dog"}},{"_index" : "article","_type" : "_doc","_id" : "4","_score" : 0.25613075,"_source" : {"title" : "a dog","content" : "this is a stupid dog"}},{"_index" : "article","_type" : "_doc","_id" : "2","_score" : 0.25613075,"_source" : {"title" : "a dog","content" : "a red dog and a blue dog is running"}},{"_index" : "article","_type" : "_doc","_id" : "5","_score" : 0.18662795,"_source" : {"title" : "red dog","content" : "I don't like red dog, bug my gridfrind like it"}}]}