Weaviate

2024-03-28 13:52
文章标签 weaviate

本文主要是介绍Weaviate,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

在这里插入图片描述


文章目录

    • 关于 Weaviate
      • 核心功能
      • 部署方式
      • 使用场景
    • 快速上手 (Python)
      • 1、创建 Weaviate 数据库
      • 2、安装
      • 3、连接到 Weaviate
      • 4、定义数据集
      • 5、添加对象
      • 6、查询
        • 1)Semantic search
        • 2) Semantic search with a filter
    • 使用示例
      • Similarity search
      • LLMs and search
      • Classification
      • Other use cases


关于 Weaviate

Weaviate is a cloud-native, open source vector database that is robust, fast, and scalable.

  • 官网:https://weaviate.io
  • github : https://github.com/weaviate/weaviate
  • 官方文档:https://weaviate.io/developers/weaviate

核心功能

在这里插入图片描述


部署方式

Multiple deployment options are available to cater for different users and use cases.

All options offer vectorizer and RAG module integration.

在这里插入图片描述


使用场景

Weaviate is flexible and can be used in many contexts and scenarios.

在这里插入图片描述


快速上手 (Python)

参考:https://weaviate.io/developers/weaviate/quickstart


1、创建 Weaviate 数据库

你可以在 Weaviate Cloud Services (WCS). 创建一个免费的 cloud sandbox 实例

方式如:https://weaviate.io/developers/wcs/quickstart

从WCS 的Details tab 拿到 API keyURL


2、安装

使用 v4 client, Weaviate 1.23.7 及以上:

pip install -U weaviate-client

使用 v3

pip install "weaviate-client==3.*"

3、连接到 Weaviate

使用步骤一拿到的 API Key 和 URL,以及 OpenAI 的推理 API Key:https://platform.openai.com/signup


运行以下代码:

V4

import weaviate
import weaviate.classes as wvc
import os
import requests
import jsonclient = weaviate.connect_to_wcs(cluster_url=os.getenv("WCS_CLUSTER_URL"),auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WCS_API_KEY")),headers={"X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]  # Replace with your inference API key}
)try:pass # Replace with your code. Close client gracefully in the finally block.finally:client.close()  # Close client gracefully

V3

import weaviate
import jsonclient = weaviate.Client(url = "https://some-endpoint.weaviate.network",  # Replace with your endpointauth_client_secret=weaviate.auth.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"),  # Replace w/ your Weaviate instance API keyadditional_headers = {"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY"  # Replace with your inference API key}
)

4、定义数据集

Next, we define a data collection (a “class” in Weaviate) to store objects in.

This is analogous to creating a table in relational (SQL) databases.


The following code:

  • Configures a class object with:
    • Name Question
    • Vectorizer module text2vec-openai
    • Generative module generative-openai
  • Then creates the class.

V4

    questions = client.collections.create(name="Question",vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),  # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.generative_config=wvc.config.Configure.Generative.openai()  # Ensure the `generative-openai` module is used for generative queries)

V3

class_obj = {"class": "Question","vectorizer": "text2vec-openai",  # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also."moduleConfig": {"text2vec-openai": {},"generative-openai": {}  # Ensure the `generative-openai` module is used for generative queries}
}client.schema.create_class(class_obj)

5、添加对象

You can now add objects to Weaviate. You will be using a batch import (read more) process for maximum efficiency.

The guide covers using the vectorizer defined for the class to create a vector embedding for each object.


The above code:

  • Loads objects, and
  • Adds objects to the target class (Question) one by one.

V4

    resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')data = json.loads(resp.text)  # Load dataquestion_objs = list()for i, d in enumerate(data):question_objs.append({"answer": d["Answer"],"question": d["Question"],"category": d["Category"],})questions = client.collections.get("Question")questions.data.insert_many(question_objs)  # This uses batching under the hood

V3

import requests
import json
resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')
data = json.loads(resp.text)  # Load dataclient.batch.configure(batch_size=100)  # Configure batch
with client.batch as batch:  # Initialize a batch processfor i, d in enumerate(data):  # Batch import dataprint(f"importing question: {i+1}")properties = {"answer": d["Answer"],"question": d["Question"],"category": d["Category"],}batch.add_data_object(data_object=properties,class_name="Question")

6、查询

1)Semantic search

Let’s start with a similarity search. A nearText search looks for objects in Weaviate whose vectors are most similar to the vector for the given input text.

Run the following code to search for objects whose vectors are most similar to that of biology.


V4

import weaviate
import weaviate.classes as wvc
import osclient = weaviate.connect_to_wcs(cluster_url=os.getenv("WCS_CLUSTER_URL"),auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WCS_API_KEY")),headers={"X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]  # Replace with your inference API key}
)try:pass # Replace with your code. Close client gracefully in the finally block.questions = client.collections.get("Question")response = questions.query.near_text(query="biology",limit=2)print(response.objects[0].properties)  # Inspect the first objectfinally:client.close()  # Close client gracefully

V3

import weaviate
import jsonclient = weaviate.Client(url = "https://some-endpoint.weaviate.network",  # Replace with your endpointauth_client_secret=weaviate.auth.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"),  # Replace w/ your Weaviate instance API keyadditional_headers = {"X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY"  # Replace with your inference API key}
)response = (client.query.get("Question", ["question", "answer", "category"]).with_near_text({"concepts": ["biology"]}).with_limit(2).do()
)print(json.dumps(response, indent=4))

结果如下

{"data": {"Get": {"Question": [{"answer": "DNA","category": "SCIENCE","question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance"},{"answer": "Liver","category": "SCIENCE","question": "This organ removes excess glucose from the blood & stores it as glycogen"}]}}
}

2) Semantic search with a filter

You can add Boolean filters to searches. For example, the above search can be modified to only in objects that have a “category” value of “ANIMALS”. Run the following code to see the results:


V4

    questions = client.collections.get("Question")response = questions.query.near_text(query="biology",limit=2,filters=wvc.query.Filter.by_property("category").equal("ANIMALS"))print(response.objects[0].properties)  # Inspect the first object

V3

response = (client.query.get("Question", ["question", "answer", "category"]).with_near_text({"concepts": ["biology"]}).with_where({"path": ["category"],"operator": "Equal","valueText": "ANIMALS"}).with_limit(2).do()
)print(json.dumps(response, indent=4))

结果如下:

{"data": {"Get": {"Question": [{"answer": "Elephant","category": "ANIMALS","question": "It's the only living mammal in the order Proboseidea"},{"answer": "the nose or snout","category": "ANIMALS","question": "The gavial looks very much like a crocodile except for this bodily feature"}]}}
}

更多可见:https://weaviate.io/developers/weaviate/quickstart


使用示例

This page illustrates various use cases for vector databases by way of open-source demo projects. You can fork and modify any of them.

If you would like to contribute your own project to this page, please let us know by creating an issue on GitHub.


Similarity search

https://weaviate.io/developers/weaviate/more-resources/example-use-cases#similarity-search

A vector databases enables fast, efficient similarity searches on and across any modalities, such as text or images, as well as their combinations. Vector database’ similarity search capabilities can be used for other complex use cases, such as recommendation systems in classical machine learning applications.

TitleDescriptionModalityCode
Plant searchSemantic search over plants.TextJavascript
Wine searchSemantic search over wines.TextPython
Book recommender system (Video, Demo)Find book recommendations based on search query.TextTypeScript
Movie recommender system (Blog)Find similar movies.TextJavascript
Multilingual Wikipedia SearchSearch through Wikipedia in multiple languages.TextTypeScript
Podcast searchSemantic search over podcast episodes.TextPython
Video Caption SearchFind the timestamp of the answer to your question in a video.TextPython
Facial RecognitionIdentify people in imagesImagePython
Image Search over dogs (Blog)Find images of similar dog breeds based on uploaded image.ImagePython
Text to image searchFind images most similar to a text query.MultimodalJavascript
Text to image and image to image searchFind images most similar to a text or image query.MultimodalPython

LLMs and search

https://weaviate.io/developers/weaviate/more-resources/example-use-cases#llms-and-search

Vector databases and LLMs go together like cookies and milk!

Vector databases help to address some of large language models (LLMs) limitations, such as hallucinations, by helping to retrieve the relevant information to provide to the LLM as a part of its input.

TitleDescriptionModalityCode
Verba, the golden RAGtriever (Video, Demo)Retrieval-Augmented Generation (RAG) system to chat with Weaviate documentation and blog posts.TextPython
HealthSearch (Blog, Demo)Recommendation system of health products based on symptoms.TextPython
Magic ChatSearch through Magic The Gathering cardsTextPython
AirBnB Listings (Blog)Generation of customized advertisements for AirBnB listings with Generative Feedback LoopsTextPython
DistyllSummarize text or video content.TextPython

Learn more in our LLMs and Search blog post.


Classification

https://weaviate.io/developers/weaviate/more-resources/example-use-cases#classification

Weaviate can leverage its vectorization capabilities to enable automatic, real-time classification of unseen, new concepts based on its semantic understanding.

TitleDescriptionModalityCode
Toxic Comment ClassificationClasify whether a comment is toxic or non-toxic.TextPython
Audio Genre ClassificationClassify the music genre of an audio file.ImagePython

Other use cases

https://weaviate.io/developers/weaviate/more-resources/example-use-cases#other-use-cases

Weaviate’s modular ecosystem unlocks many other use cases of the Weaviate vector database, such as Named Entity Recognition or spell checking.

TitleDescriptionCode
Named Entity Recognition (NER)tbdPython

2024-03-27(三)

这篇关于Weaviate的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/855794

相关文章

我的第2个AI项目-RAG with Gemma hosted on HuggingFace and Weaviate in DSPy

目录 项目简介概述时间kaggle地址主要工作和收获技术栈数据集模型表现 未来项目说明思路和原则为什么不把现在的项目做深一点?博客风格转变 bug修复版本兼容问题 项目简介 概述 本项目简要介绍了如何使用 DSPy 构建一个简单的 RAG 管道,且利用了托管在 Hugging Face 上的 Gemma LLM模型 和 Weaviate 向量数据库。 时间 2024.09

私有化文本嵌入(Embedding) + Weaviate

weavaite向量库可以集成第三方托管的模型,这使得开发体验得到了增强,例如 1、将对象直接导入Weaviate,无需手动指定嵌入(Embedding) 2、使用生成式AI模型(LLM)构建集成检索增强生成(RAG)管道 同时weaviate也可以与Transformers库无缝集成,允许用户直接在Weaviate数据库中利用兼容的模型。这些集成使开发人员能够轻松构建复杂的人工智能驱动应用

动态 ETL 管道:使用非结构化 IO 将 AI 与 MinIO 和 Weaviate 的 Web

在现代数据驱动的环境中,网络是一个无穷无尽的信息来源,为洞察力和创新提供了巨大的潜力。然而,挑战在于提取、构建和分析这片浩瀚的数据海洋,使其具有可操作性。这就是Unstructured-IO 的创新,结合MinIO的对象存储和Weaviate的AI和元数据功能的强大功能。它们共同创建了一个动态 ETL 管道,能够将非结构化 Web 数据转换为结构化的、可分析的格式。 本文探讨了这些强大技术的

使用MinIO S3存储桶备份Weaviate

Weaviate 是一个开创性的开源向量数据库,旨在通过利用机器学习模型来增强语义搜索。与依赖关键字匹配的传统搜索引擎不同,Weaviate 采用语义相似性原则。这种创新方法将各种形式的数据(文本、图像等)转换为矢量表示形式,即捕捉数据上下文和含义本质的数字形式。通过分析这些向量之间的相似性,Weaviate提供了真正了解用户意图的搜索结果,从而超越了基于关键字的搜索的局限性。 本指南旨在演示

《向量数据库指南》——向量数据库Weaviate Cloud 特性对比

随着以 Milvus 为代表的向量数据库在 AI 产业界越来越受欢迎,传统数据库和检索系统也开始在快速集成专门的向量检索插件方面展开角逐。 例如 Weaviate 推出开源向量数据库,凭借其易用、开发者友好、上手快速、API 文档齐全等特点脱颖而出。同样,Zilliz Cloud/Milvus 向量数据库因为能够高性能、低时延处理海量数据而备受瞩目。 二者都是专为向量数据打造,但适用于

如何选择向量数据库|Weaviate Cloud v.s. Zilliz Cloud

随着以 Milvus 为代表的向量数据库在 AI 产业界越来越受欢迎,传统数据库和检索系统也开始在快速集成专门的向量检索插件方面展开角逐。 例如 Weaviate 推出开源向量数据库,凭借其易用、开发者友好、上手快速、API 文档齐全等特点脱颖而出。同样,Zilliz Cloud/Milvus 向量数据库因为能够高性能、低时延处理海量数据而备受瞩目。 二者都是专为向量数据打造,但适用于不同场景。