LangChain 一个完整的例子 电脑版发表于:2023/10/5 17:08 ![](https://img.tnblog.net/arcimg/hb/dcbb080269134178b1048148e5315d8f.png) >#LangChain 一个完整的例子 [TOC] ## 简介 tn2>这是该 `LangChain` 极简入门系列的最后一讲。我们将利用过去9讲学习的知识,来完成一个具备完整功能集的LLM应用。该应用基于 `LangChain` 框架,以某 `PDF` 文件的内容为知识库,提供给用户基于该文件内容的问答能力。<br/> 我们利用 `LangChain` 的QA chain,结合 `Chroma` 来实现PDF文档的语义化搜索。示例代码所引用的是[AWS Serverless Developer Guide](https://docs.aws.amazon.com/pdfs/serverless/latest/devguide/serverless-core.pdf),该PDF文档共84页。 ### 安装必要的 `Python` 包 ```shell !pip install -q langchain==0.0.235 openai chromadb pymupdf tiktoken ``` ### 设置OpenAI环境 ```python import os os.environ['OPENAI_API_KEY'] = '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' ``` ### 下载PDF文件AWS Serverless Developer Guide ```python !wget https://docs.aws.amazon.com/pdfs/serverless/latest/devguide/serverless-core.pdf PDF_NAME = 'serverless-core.pdf' ``` ### 加载PDF文件 ```python from langchain.document_loaders import PyMuPDFLoader docs = PyMuPDFLoader(PDF_NAME).load() print (f'There are {len(docs)} document(s) in {PDF_NAME}.') print (f'There are {len(docs[0].page_content)} characters in the first page of your document.') ``` tn2>拆分文档并存储文本嵌入的向量数据 ```python from langchain.embeddings.openai import OpenAIEmbeddings from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.vectorstores import Chroma text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200) split_docs = text_splitter.split_documents(docs) embeddings = OpenAIEmbeddings() vectorstore = Chroma.from_documents(split_docs, embeddings, collection_name="serverless_guide") ``` ### 基于OpenAI创建QA链 ```python from langchain.llms import OpenAI from langchain.chains.question_answering import load_qa_chain llm = OpenAI(temperature=0) chain = load_qa_chain(llm, chain_type="stuff") ``` ### 基于提问,进行相似性查询 ```python query = "What is the use case of AWS Serverless?" similar_docs = vectorstore.similarity_search(query, 3, include_metadata=True) ``` ![](https://img.tnblog.net/arcimg/hb/530917a432df46b09a30dda3a8926948.png) ### 基于相关文档,利用QA链完成回答 ```python chain.run(input_documents=similar_docs, question=query) ``` ![](https://img.tnblog.net/arcimg/hb/45db9f106a524c1998407f170180213a.png)