Qdrant官方快速入门和教程简化版

2024-08-29 04:36

本文主要是介绍Qdrant官方快速入门和教程简化版,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

Qdrant官方快速入门和教程简化版

说明:

  • 首次发表日期:2024-08-28
  • Qdrant官方文档:https://qdrant.tech/documentation/

关于

阅读Qdrant一小部分的官方文档,并使用中文简化记录下,更多请阅读官方文档。

使用Docker本地部署Qdrant

docker pull qdrant/qdrant
docker run -d -p 6333:6333 -p 6334:6334 \-v $(pwd)/qdrant_storage:/qdrant/storage:z \qdrant/qdrant

默认配置下,所有的数据存储在./qdrant_storage

快速入门

安装qdrant-client包(python):

pip install qdrant-client

初始化客户端:

from qdrant_client import QdrantClientclient = QdrantClient(url="http://localhost:6333")

所有的向量数据(vector data)都存储在Qdrant Collection上。创建一个名为test_collection的collection,该collection使用dot product作为比较向量的指标。

from qdrant_client.models import Distance, VectorParamsclient.create_collection(collection_name="test_collection",vectors_config=VectorParams(size=4, distance=Distance.DOT),
)

添加带payload的向量。payload是与向量相关联的数据。

from qdrant_client.models import PointStructoperation_info = client.upsert(collection_name="test_collection",wait=True,points=[PointStruct(id=1, vector=[0.05, 0.61, 0.76, 0.74], payload={"city": "Berlin"}),PointStruct(id=2, vector=[0.19, 0.81, 0.75, 0.11], payload={"city": "London"}),PointStruct(id=3, vector=[0.36, 0.55, 0.47, 0.94], payload={"city": "Moscow"}),PointStruct(id=4, vector=[0.18, 0.01, 0.85, 0.80], payload={"city": "New York"}),PointStruct(id=5, vector=[0.24, 0.18, 0.22, 0.44], payload={"city": "Beijing"}),PointStruct(id=6, vector=[0.35, 0.08, 0.11, 0.44], payload={"city": "Mumbai"}),]
)print(operation_info)

运行一个查询:

search_result = client.query_points(collection_name="test_collection", query=[0.2, 0.1, 0.9, 0.7], limit=3
).pointsprint(search_result)

输出:

[{"id": 4,"version": 0,"score": 1.362,"payload": null,"vector": null},{"id": 1,"version": 0,"score": 1.273,"payload": null,"vector": null},{"id": 3,"version": 0,"score": 1.208,"payload": null,"vector": null}
]

添加一个过滤器:

from qdrant_client.models import Filter, FieldCondition, MatchValuesearch_result = client.query_points(collection_name="test_collection",query=[0.2, 0.1, 0.9, 0.7],query_filter=Filter(must=[FieldCondition(key="city", match=MatchValue(value="London"))]),with_payload=True,limit=3,
).pointsprint(search_result)

输出:

[{"id": 2,"version": 0,"score": 0.871,"payload": {"city": "London"},"vector": null}
]

教程

语义搜索入门

安装依赖:

pip install sentence-transformers

导入模块:

from qdrant_client import models, QdrantClient
from sentence_transformers import SentenceTransformer

使用all-MiniLM-L6-v2编码器作为embedding模型,embedding模型可以将raw data转化为embeddings)

encoder = SentenceTransformer("all-MiniLM-L6-v2")

添加数据集:

documents = [{"name": "The Time Machine","description": "A man travels through time and witnesses the evolution of humanity.","author": "H.G. Wells","year": 1895,},{"name": "Ender's Game","description": "A young boy is trained to become a military leader in a war against an alien race.","author": "Orson Scott Card","year": 1985,},{"name": "Brave New World","description": "A dystopian society where people are genetically engineered and conditioned to conform to a strict social hierarchy.","author": "Aldous Huxley","year": 1932,},{"name": "The Hitchhiker's Guide to the Galaxy","description": "A comedic science fiction series following the misadventures of an unwitting human and his alien friend.","author": "Douglas Adams","year": 1979,},{"name": "Dune","description": "A desert planet is the site of political intrigue and power struggles.","author": "Frank Herbert","year": 1965,},{"name": "Foundation","description": "A mathematician develops a science to predict the future of humanity and works to save civilization from collapse.","author": "Isaac Asimov","year": 1951,},{"name": "Snow Crash","description": "A futuristic world where the internet has evolved into a virtual reality metaverse.","author": "Neal Stephenson","year": 1992,},{"name": "Neuromancer","description": "A hacker is hired to pull off a near-impossible hack and gets pulled into a web of intrigue.","author": "William Gibson","year": 1984,},{"name": "The War of the Worlds","description": "A Martian invasion of Earth throws humanity into chaos.","author": "H.G. Wells","year": 1898,},{"name": "The Hunger Games","description": "A dystopian society where teenagers are forced to fight to the death in a televised spectacle.","author": "Suzanne Collins","year": 2008,},{"name": "The Andromeda Strain","description": "A deadly virus from outer space threatens to wipe out humanity.","author": "Michael Crichton","year": 1969,},{"name": "The Left Hand of Darkness","description": "A human ambassador is sent to a planet where the inhabitants are genderless and can change gender at will.","author": "Ursula K. Le Guin","year": 1969,},{"name": "The Three-Body Problem","description": "Humans encounter an alien civilization that lives in a dying system.","author": "Liu Cixin","year": 2008,},
]

将embedding数据存储在内存中:

client = QdrantClient(":memory:")

创建一个collection:

client.create_collection(collection_name="my_books",vectors_config=models.VectorParams(size=encoder.get_sentence_embedding_dimension(),  # Vector size is defined by used modeldistance=models.Distance.COSINE,),
)

上传数据:

client.upload_points(collection_name="my_books",points=[models.PointStruct(id=idx, vector=encoder.encode(doc["description"]).tolist(), payload=doc)for idx, doc in enumerate(documents)],
)

问一个问题:

hits = client.query_points(collection_name="my_books",query=encoder.encode("alien invasion").tolist(),limit=3,
).pointsfor hit in hits:print(hit.payload, "score:", hit.score)

输出:

{'name': 'The War of the Worlds', 'description': 'A Martian invasion of Earth throws humanity into chaos.', 'author': 'H.G. Wells', 'year': 1898} score: 0.570093257022374
{'name': "The Hitchhiker's Guide to the Galaxy", 'description': 'A comedic science fiction series following the misadventures of an unwitting human and his alien friend.', 'author': 'Douglas Adams', 'year': 1979} score: 0.5040468703143637
{'name': 'The Three-Body Problem', 'description': 'Humans encounter an alien civilization that lives in a dying system.', 'author': 'Liu Cixin', 'year': 2008} score: 0.45902943411768216

过滤以便缩窄查询:

hits = client.query_points(collection_name="my_books",query=encoder.encode("alien invasion").tolist(),query_filter=models.Filter(must=[models.FieldCondition(key="year", range=models.Range(gte=2000))]),limit=1,
).pointsfor hit in hits:print(hit.payload, "score:", hit.score)

输出:

{'name': 'The Three-Body Problem', 'description': 'Humans encounter an alien civilization that lives in a dying system.', 'author': 'Liu Cixin', 'year': 2008} score: 0.45902943411768216

简单的神经搜索

下载样本数据集:

wget https://storage.googleapis.com/generall-shared-data/startups_demo.json

安装SentenceTransformer等依赖库:

pip install sentence-transformers numpy pandas tqdm

导入模块:

from sentence_transformers import SentenceTransformer
import numpy as np
import json
import pandas as pd
from tqdm.notebook import tqdm

创建sentence encoder:

model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda"
)  # or device="cpu" if you don't have a GPU

读取数据:

df = pd.read_json("./startups_demo.json", lines=True)

为每一个description创建embedding向量。encode内部会将输入切分为一个个batch,以便提高处理速度。

vectors = model.encode([row.alt + ". " + row.description for row in df.itertuples()],show_progress_bar=True,
)
vectors.shape
# > (40474, 384)

保存为npy文件:

np.save("startup_vectors.npy", vectors, allow_pickle=False)

启动docker服务

docker pull qdrant/qdrant
docker run -p 6333:6333 \-v $(pwd)/qdrant_storage:/qdrant/storage \qdrant/qdrant

创建Qdrant客户端

# Import client library
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distanceclient = QdrantClient("http://localhost:6333")

创建collection,其中384是embedding模型(all-MiniLM-L6-v2)的输出维度。

if not client.collection_exists("startups"):client.create_collection(collection_name="startups",vectors_config=VectorParams(size=384, distance=Distance.COSINE),)

加载数据

fd = open("./startups_demo.json")# payload is now an iterator over startup data
payload = map(json.loads, fd)# Load all vectors into memory, numpy array works as iterable for itself.
# Other option would be to use Mmap, if you don't want to load all data into RAM
vectors = np.load("./startup_vectors.npy")

上传数据到Qdrant

client.upload_collection(collection_name="startups",vectors=vectors,payload=payload,ids=None,  # Vector ids will be assigned automaticallybatch_size=256,  # How many vectors will be uploaded in a single request?
)

创建neural_searcher.py文件:

from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformerclass NeuralSearcher:def __init__(self, collection_name):self.collection_name = collection_name# Initialize encoder modelself.model = SentenceTransformer("all-MiniLM-L6-v2", device="cpu")# initializa Qdrant clientself.qdrant_client = QdrantClient("http://localhost:6333")def search(self, text:str):# Convert text query into vectorvector = self.model.encode(text).tolist()# Use `vector` for search for closet vectors in the collectionsearch_result = self.qdrant_client.search(collection_name=self.collection_name,query_vector=vector,query_filter=None, # If you don't want any filters for nowlimit=5, # 5 the most closet results is enough)# `search_result` contains found vector ids with similarity scores along with stored payload# In this function you are interested in payload onlypayloads = [hit.payload for hit in search_result]return payloads

使用FastAPI部署:

pip install fastapi uvicorn
from qdrant_client import QdrantClient
from qdrant_client.models import Filter
from sentence_transformers import SentenceTransformerclass NeuralSearcher:def __init__(self, collection_name):self.collection_name = collection_name# Initialize encoder modelself.model = SentenceTransformer("all-MiniLM-L6-v2", device="cpu")# initializa Qdrant clientself.qdrant_client = QdrantClient("http://localhost:6333")def search(self, text:str):# Convert text query into vectorvector = self.model.encode(text).tolist()# Use `vector` for search for closet vectors in the collectionsearch_result = self.qdrant_client.search(collection_name=self.collection_name,query_vector=vector,query_filter=None, # If you don't want any filters for nowlimit=5, # 5 the most closet results is enough)# `search_result` contains found vector ids with similarity scores along with stored payload# In this function you are interested in payload onlypayloads = [hit.payload for hit in search_result]return payloadsdef search_in_berlin(self, text:str):# Convert text query into vectorvector = self.model.encode(text).tolist()city_of_interest = "Berlin"# Define a filter for citiescity_filter = Filter(**{"must": [{"key": "city", # Store city information in a field of the same name "match": { # This condition checks if payload field has the requested value"value": city_of_interest}}]})# Use `vector` for search for closet vectors in the collectionsearch_result = self.qdrant_client.query_points(collection_name=self.collection_name,query=vector,query_filter=city_filter,limit=5,).points# `search_result` contains found vector ids with similarity scores along with stored payload# In this function you are interested in payload onlypayloads = [hit.payload for hit in search_result]return payloads
from fastapi import FastAPIapp = FastAPI()# Create a neural searcher instance
neural_searcher = NeuralSearcher(collection_name="startups")@app.get("/api/search")
def search_startup(q: str):return {"result": neural_searcher.search(text=q)}@app.get("/api/search_in_berlin")
def search_startup_filter(q: str):return {"result": neural_searcher.search_in_berlin(text=q)}if __name__ == "__main__":import uvicornuvicorn.run(app, host="0.0.0.0", port=8001)

如果是在jupyter notebook中运行,则需要添加

import nest_asyncio
nest_asyncio.apply()

安装nest_asyncio:

pip install nest_asyncio

异步使用Qdrant

Qdrant原生支持async

from qdrant_client import modelsimport qdrant_client
import asyncioasync def main():client = qdrant_client.AsyncQdrantClient("localhost")# Create a collectionawait client.create_collection(collection_name="my_collection",vectors_config=models.VectorParams(size=4, distance=models.Distance.COSINE),)# Insert a vectorawait client.upsert(collection_name="my_collection",points=[models.PointStruct(id="5c56c793-69f3-4fbf-87e6-c4bf54c28c26",payload={"color": "red",},vector=[0.9, 0.1, 0.1, 0.5],),],)# Search for nearest neighborspoints = await client.query_points(collection_name="my_collection",query=[0.9, 0.1, 0.1, 0.5],limit=2,).points# Your async code using AsyncQdrantClient might be put here# ...asyncio.run(main())

这篇关于Qdrant官方快速入门和教程简化版的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1116853

相关文章

windos server2022的配置故障转移服务的图文教程

《windosserver2022的配置故障转移服务的图文教程》本文主要介绍了windosserver2022的配置故障转移服务的图文教程,以确保服务和应用程序的连续性和可用性,文中通过图文介绍的非... 目录准备环境:步骤故障转移群集是 Windows Server 2022 中提供的一种功能,用于在多个

Mybatis官方生成器的使用方式

《Mybatis官方生成器的使用方式》本文详细介绍了MyBatisGenerator(MBG)的使用方法,通过实际代码示例展示了如何配置Maven插件来自动化生成MyBatis项目所需的实体类、Map... 目录1. MyBATis Generator 简介2. MyBatis Generator 的功能3

龙蜥操作系统Anolis OS-23.x安装配置图解教程(保姆级)

《龙蜥操作系统AnolisOS-23.x安装配置图解教程(保姆级)》:本文主要介绍了安装和配置AnolisOS23.2系统,包括分区、软件选择、设置root密码、网络配置、主机名设置和禁用SELinux的步骤,详细内容请阅读本文,希望能对你有所帮助... ‌AnolisOS‌是由阿里云推出的开源操作系统,旨

PyTorch使用教程之Tensor包详解

《PyTorch使用教程之Tensor包详解》这篇文章介绍了PyTorch中的张量(Tensor)数据结构,包括张量的数据类型、初始化、常用操作、属性等,张量是PyTorch框架中的核心数据结构,支持... 目录1、张量Tensor2、数据类型3、初始化(构造张量)4、常用操作5、常用属性5.1 存储(st

Java操作PDF文件实现签订电子合同详细教程

《Java操作PDF文件实现签订电子合同详细教程》:本文主要介绍如何在PDF中加入电子签章与电子签名的过程,包括编写Word文件、生成PDF、为PDF格式做表单、为表单赋值、生成文档以及上传到OB... 目录前言:先看效果:1.编写word文件1.2然后生成PDF格式进行保存1.3我这里是将文件保存到本地后

windows系统下shutdown重启关机命令超详细教程

《windows系统下shutdown重启关机命令超详细教程》shutdown命令是一个强大的工具,允许你通过命令行快速完成关机、重启或注销操作,本文将为你详细解析shutdown命令的使用方法,并提... 目录一、shutdown 命令简介二、shutdown 命令的基本用法三、远程关机与重启四、实际应用

python库fire使用教程

《python库fire使用教程》本文主要介绍了python库fire使用教程,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学习吧... 目录1.简介2. fire安装3. fire使用示例1.简介目前python命令行解析库用过的有:ar

LinuxMint怎么安装? Linux Mint22下载安装图文教程

《LinuxMint怎么安装?LinuxMint22下载安装图文教程》LinuxMint22发布以后,有很多新功能,很多朋友想要下载并安装,该怎么操作呢?下面我们就来看看详细安装指南... linux Mint 是一款基于 Ubuntu 的流行发行版,凭借其现代、精致、易于使用的特性,深受小伙伴们所喜爱。对

shell脚本快速检查192.168.1网段ip是否在用的方法

《shell脚本快速检查192.168.1网段ip是否在用的方法》该Shell脚本通过并发ping命令检查192.168.1网段中哪些IP地址正在使用,脚本定义了网络段、超时时间和并行扫描数量,并使用... 目录脚本:检查 192.168.1 网段 IP 是否在用脚本说明使用方法示例输出优化建议总结检查 1

使用Nginx来共享文件的详细教程

《使用Nginx来共享文件的详细教程》有时我们想共享电脑上的某些文件,一个比较方便的做法是,开一个HTTP服务,指向文件所在的目录,这次我们用nginx来实现这个需求,本文将通过代码示例一步步教你使用... 在本教程中,我们将向您展示如何使用开源 Web 服务器 Nginx 设置文件共享服务器步骤 0 —