本文主要是介绍LangChain 0.2 - 构建查询分析系统,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
本文翻译整理自:Build a Query Analysis System
- 一、项目说明
- 二、设置
- 1、安装依赖项
- 2、设置环境变量
- 3、加载文档
- 4、索引文档
- 三、不使用查询分析的检索
- 四、查询分析
- 1、查询模式
- 2、查询生成
- 五、使用查询分析的检索
为了本例,我们将对 LangChain YouTube 视频进行检索。
pip install -qU langchain langchain-community langchain-openai youtube-transcript-api pytube langchain-chroma
我们将在此示例中使用 OpenAI:
import getpass
import osos.environ["OPENAI_API_KEY"] = getpass.getpass()# Optional, uncomment to trace runs with LangSmith. Sign up here: https://smith.langchain.com.
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()
来加载一些 LangChain 视频的成绩单:
from langchain_community.document_loaders import YoutubeLoaderurls = ["https://www.youtube.com/watch?v=HAn9vnJy6S4","https://www.youtube.com/watch?v=dA1cHGACXCo","https://www.youtube.com/watch?v=ZcEMLz27sL4","https://www.youtube.com/watch?v=hvAPnpSfSGo","https://www.youtube.com/watch?v=EhlPDL4QrWY","https://www.youtube.com/watch?v=mmBo8nlu2j0","https://www.youtube.com/watch?v=rQdibOsL1ps","https://www.youtube.com/watch?v=28lC4fqukoc","https://www.youtube.com/watch?v=es-9MgxB-uc","https://www.youtube.com/watch?v=wLRHwKuKvOE","https://www.youtube.com/watch?v=ObIltMaRJvY","https://www.youtube.com/watch?v=DjuXACWYkkU","https://www.youtube.com/watch?v=o7C9ld6Ln-M",
docs = []
for url in urls:docs.extend(YoutubeLoader.from_youtube_url(url, add_video_info=True).load())
API 参考:YoutubeLoader
import datetime# Add some additional metadata: what year the video was published
for doc in docs:doc.metadata["publish_year"] = int(datetime.datetime.strptime(doc.metadata["publish_date"], "%Y-%m-%d %H:%M:%S").strftime("%Y"))
[doc.metadata["title"] for doc in docs]
['OpenGPTs','Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve','Streaming Events: Introducing a new `stream_events` method','LangGraph: Multi-Agent Workflows','Build and Deploy a RAG app with Pinecone Serverless','Auto-Prompt Builder (with Hosted LangServe)','Build a Full Stack RAG App With TypeScript','Getting Started with Multi-Modal LLMs','SQL Research Assistant','Skeleton-of-Thought: Building a New Template from Scratch','Benchmarking RAG over LangChain Docs','Building a Research Assistant from Scratch','LangServe and LangChain Templates Webinar']
{'source': 'HAn9vnJy6S4','title': 'OpenGPTs','description': 'Unknown','view_count': 7210,'thumbnail_url': 'https://i.ytimg.com/vi/HAn9vnJy6S4/hq720.jpg','publish_date': '2024-01-31 00:00:00','length': 1530,'author': 'LangChain','publish_year': 2024}
"hello today I want to talk about open gpts open gpts is a project that we built here at linkchain uh that replicates the GPT store in a few ways so it creates uh end user-facing friendly interface to create different Bots and these Bots can have access to different tools and they can uh be given files to retrieve things over and basically it's a way to create a variety of bots and expose the configuration of these Bots to end users it's all open source um it can be used with open AI it can be us"
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplittertext_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)
chunked_docs = text_splitter.split_documents(docs)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(chunked_docs,embeddings,
API 参考:OpenAIEmbeddings | RecursiveCharacterTextSplitter
search_results = vectorstore.similarity_search("how do I build a RAG agent")
Build and Deploy a RAG app with Pinecone Serverless
hi this is Lance from the Lang chain team and today we're going to be building and deploying a rag app using pine con serval list from scratch so we're going to kind of walk through all the code required to do this and I'll use these slides as kind of a guide to kind of lay the the ground work um so first what is rag so under capoy has this pretty nice visualization that shows LMS as a kernel of a new kind of operating system and of course one of the core components of our operating system is th
search_results = vectorstore.similarity_search("videos on RAG published in 2023")
hardcoded that it will always do a retrieval step here the assistant decides whether to do a retrieval step or not sometimes this is good sometimes this is bad sometimes it you don't need to do a retrieval step when I said hi it didn't need to call it tool um but other times you know the the llm might mess up and not realize that it needs to do a retrieval step and so the rag bot will always do a retrieval step so it's more focused there because this is also a simpler architecture so it's always
我们的第一个结果来自 2024 年(尽管我们要求搜索 2023 年的视频),并且与输入不太相关。由于我们只是针对文档内容进行搜索,因此无法根据任何文档属性对结果进行过滤。
from typing import Optionalfrom langchain_core.pydantic_v1 import BaseModel, Fieldclass Search(BaseModel):"""Search over a database of tutorial videos about a software library."""query: str = Field(...,description="Similarity search query applied to video transcripts.",)publish_year: Optional[int] = Field(None, description="Year video was published")
为了将用户问题转换为结构化查询,我们将使用 OpenAI 的工具调用 API。具体来说,我们将使用新的ChatModel.with_structured_output()构造函数来处理将架构传递给模型并解析输出。
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAIsystem = """You are an expert at converting user questions into database queries. \
You have access to a database of tutorial videos about a software library for building LLM-powered applications. \
Given a question, return a list of database queries optimized to retrieve the most relevant results.If there are acronyms or words you are not familiar with, do not try to rephrase them."""
prompt = ChatPromptTemplate.from_messages([("system", system),("human", "{question}"),]
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm = llm.with_structured_output(Search)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm
API 参考:ChatPromptTemplate | RunnablePassthrough | ChatOpenAI
/Users/bagatur/langchain/libs/core/langchain_core/_api/beta_decorator.py:86: LangChainBetaWarning: The function `with_structured_output` is in beta. It is actively being worked on, so the API may change.warn_beta(
query_analyzer.invoke("how do I build a RAG agent")
# -> Search(query='build RAG agent', publish_year=None)query_analyzer.invoke("videos on RAG published in 2023")
# -> Search(query='RAG', publish_year=2023)
。这将强制 LLM 调用一个(且只有一个)工具,这意味着我们将始终有一个优化查询要查找。请注意,情况并非总是如此 - 请参阅其他指南,了解如何处理没有返回或返回多个优化查询的情况。
from typing import Listfrom langchain_core.documents import Document
API 参考:文档
def retrieval(search: Search) -> List[Document]:if search.publish_year is not None:# This is syntax specific to Chroma,# the vector database we are using._filter = {"publish_year": {"$eq": search.publish_year}}else:_filter = Nonereturn vectorstore.similarity_search(search.query, filter=_filter)
retrieval_chain = query_analyzer | retrieval
results = retrieval_chain.invoke("RAG tutorial published in 2023")[(doc.metadata["title"], doc.metadata["publish_date"]) for doc in results]
[('Getting Started with Multi-Modal LLMs', '2023-12-20 00:00:00'),('LangServe and LangChain Templates Webinar', '2023-11-02 00:00:00'),('Getting Started with Multi-Modal LLMs', '2023-12-20 00:00:00'),('Building a Research Assistant from Scratch', '2023-11-16 00:00:00')]
这篇关于LangChain 0.2 - 构建查询分析系统的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!