Chapter 4 Building RAG Applications

4.1 Connect LLM to LangChain

LangChain provides an efficient development framework for developing custom applications based on LLM, allowing developers to quickly activate the powerful capabilities of LLM and build LLM applications. LangChain also supports a variety of large models and has built-in calling interfaces for large models such as OpenAI and LLAMA. However, LangChain does not have all large models built-in. It provides strong scalability by allowing users to customize LLM types.

4.1.1 Call ChatGPT based on LangChain

LangChain provides encapsulation of a variety of large models. The interface based on LangChain can easily call ChatGPT and integrate it into personal applications built with LangChain as the basic framework. Here we briefly describe how to use the LangChain interface to call ChatGPT.

Note that calling ChatGPT based on the LangChain interface also requires configuring your personal key. The configuration method is the same as above.

fromlangchain.chat_modelsimportOpenAIdialogue modelChatOpenAI. Except for OpenAI,langchain.chat_modelsOther dialogue models are also integrated, see Langchain官方文档 for more details.

import os
from dotenv import load_dotenv, find_dotenv

# 读取本地/项目的环境变量。

# find_dotenv()寻找并定位.env文件的路径
# load_dotenv()读取该.env文件，并将其中的环境变量加载到当前的运行环境中  
# 如果你设置的是全局的环境变量，这行代码则没有任何作用。
_ = load_dotenv(find_dotenv())

# 获取环境变量 OPENAI_API_KEY
openai_api_key = os.environ['OPENAI_API_KEY']

If langchain-openai is not installed, please run the following code first!

from langchain_openai import ChatOpenAI

Next you need to instantiate a ChatOpenAI class. You can pass in hyperparameters when instantiating to control the answer, for exampletemperatureparameter.

# 这里我们将参数temperature设置为0.0，从而减少生成答案的随机性。
# 如果你想要每次得到不一样的有新意的答案，可以尝试调整该参数。
llm = ChatOpenAI(temperature=0.0)
llm

ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x117efa8f0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x117f14e20>, root_client=<openai.OpenAI object at 0x1157c7d30>, root_async_client=<openai.AsyncOpenAI object at 0x117efa950>, temperature=0.0, model_kwargs={}, openai_api_key=SecretStr('**********'))

The cell above assumes that your OpenAI API key is set in an environment variable, if you wish to specify the API key manually, use the following code:

# llm = ChatOpenAI(temperature=0, openai_api_key="YOUR_API_KEY")

As you can see, the ChatGPT-3.5 model is called by default. In addition, several commonly used hyperparameter settings include:

· model_name: The model to be used, the default is ‘gpt-3.5-turbo’, and the parameter settings are consistent with the OpenAI native interface parameter settings.

· temperature: temperature coefficient, the value is the same as the native interface.

· openai_api_key: OpenAI API key. If you do not use environment variables to set the API Key, you can also set it during instantiation.

· openai_proxy: Set the proxy. If you do not use environment variables to set the proxy, you can also set it at instantiation time.

· Streaming: Whether to use streaming, that is, output the model answer verbatim. The default is False, which will not be described here.

· max_tokens: The maximum number of tokens output by the model. The meaning and value are the same as above.

When we initialize theLLMAfter that, we can try to use it! Let’s ask “Please introduce yourself!”

output = llm.invoke("请你自我介绍一下自己！")

output

AIMessage(content='Hello, I am an intelligent assistant dedicated to providing you with various services and help. I can answer your questions, provide information and suggestions, and help you solve problems. If you have any needs, please feel free to tell me and I will try my best to help you. Thank you for choosing me as your assistant! If you have any questions, please feel free to ask me.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 109, 'prompt_tokens': 20, 'total_tokens': 129, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-c611b32a-4adf-47af-9b97-6dda68a117e1-0', usage_metadata={'input_tokens': 20, 'output_tokens': 109, 'total_tokens': 129})

When we develop large model applications, in most cases the user's input is not passed directly to the LLM. Typically they add user input to a larger text called提示模板, this text provides additional context about the specific task at hand. PromptTemplates helps solve this problem! They bundle all logic from user input to fully formatted prompts. This can be started very simply - for example, the tip for generating the string above is:

We need to construct a personalized Template first:

# 这里我们要求模型对给定文本进行中文翻译
prompt = """请你将由三个反引号分割的文本翻译成英文！\
text: ```{text}```
"""

Next, let's take a look at the completed prompt template:

text = "我带着比身体重的行李，\
游入尼罗河底，\
经过几道闪电 看到一堆光圈，\
不确定是不是这里。\
"
prompt.format(text=text)

'Please translate the text separated by three backticks into English! text:我带着比身体重的行李，游入尼罗河底，经过几道闪电看到一堆光圈，不确定是不是这里。\n'

We know that the interface of the chat model is based on messages, not raw text. PromptTemplates can also be used to generate message lists. In this example,promptIt not only contains the input content information, but also contains eachmessageinformation (role, position in the list, etc.). Typically, aChatPromptTemplateis aChatMessageTemplatelist. eachChatMessageTemplateContains instructions for formatting this chat message (its role as well as its content).

Let's look at an example together:

from langchain_core.prompts import ChatPromptTemplate

template = "你是一个翻译助手，可以帮助我将 {input_language} 翻译成 {output_language}."
human_template = "{text}"

chat_prompt = ChatPromptTemplate([
    ("system", template),
    ("human", human_template),
])

text = "我带着比身体重的行李，\
游入尼罗河底，\
经过几道闪电 看到一堆光圈，\
不确定是不是这里。\
"
messages  = chat_prompt.invoke({"input_language": "中文", "output_language": "英文", "text": text})
messages

[SystemMessage(content='You are a translation assistant who can help me translate Chinese into English.', additional_kwargs={}, response_metadata={}), HumanMessage(content='I swam into the bottom of the Nile River with luggage that was heavier than my body. After several lightning bolts, I saw a bunch of circles of light, not sure if they were here.', additional_kwargs={}, response_metadata={})]

Next let us call the definedllmandmessagesTo output the answer:

output  = llm.invoke(messages)
output

AIMessage(content='I carried luggage heavier than my body and dived into the bottom of the Nile River. After passing through several flashes of lightning, I saw a pile of halos, not sure if this is the place.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 43, 'prompt_tokens': 95, 'total_tokens': 138, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-c58ae91f-7c0a-4f20-ad7f-8d6421f3a9aa-0', usage_metadata={'input_tokens': 95, 'output_tokens': 43, 'total_tokens': 138})

OutputParsers convert the raw output of the language model into a format that can be used downstream. There are several main types of OutputParsers, including:

Convert LLM text to structured information (e.g. JSON)
Convert ChatMessage to string
Convert extra information returned by calls other than messages (such as OpenAI function calls) to strings

Finally, we pass the model output tooutput_parser, it is aBaseOutputParser, which means it accepts a String or a BaseMessage as input. StrOutputParser is particularly simple to convert any input into a string.

from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()
output_parser.invoke(output)

'I carried luggage heavier than my body and dived into the bottom of the Nile River. After passing through several flashes of lightning, I saw a pile of halos, not sure if this is the place.'

As can be seen from the above results, we successfully used the output parser toChatMessageThe output of type resolves to字符串

We can now combine all of this into a chain. This chain will take the input variables, pass those variables to the prompt template to create the prompt, pass the prompt to the language model, and then pass the output through the (optional) output parser. Next we will use LCEL syntax to quickly implement a chain. Let’s see it in action!

chain = chat_prompt | llm | output_parser
chain.invoke({"input_language":"中文", "output_language":"英文","text": text})

'I carried luggage heavier than my body weight and dived into the bottom of the Nile River. After passing through several flashes of lightning, I saw a pile of halos, not sure if this is the place.'

Let’s test another example:

text = 'I carried luggage heavier than my body and dived into the bottom of the Nile River. After passing through several flashes of lightning, I saw a pile of halos, not sure if this is the place.'
chain.invoke({"input_language": "英文", "output_language": "中文","text": text})

'I dived to the bottom of the Nile carrying luggage heavier than my body. After passing through a few bolts of lightning, I saw a bunch of rings and wasn't sure if this was the destination. '

What is LCEL? LCEL (LangChain Expression Language, Langchain's expression language), LCEL is a new syntax and an important addition to the LangChain toolkit. It has many advantages, making it easier and more convenient for us to deal with LangChain and agents.

LCEL provides asynchronous, batch and stream processing support so that code can be quickly ported across different servers.
LCEL has backup measures to solve the problem of LLM format output.
LCEL increases the parallelism of LLM and improves efficiency.
LCEL has built-in logging, which helps understand the operation of complex chains and agents even if the agent becomes complex.

Usage examples:

chain = prompt | model | output_parser

In the code above we use LCEL to piece together the different components into a chain where user input is passed to the prompt template, then the prompt template output is passed to the model, and then the model output is passed to the output parser. The notation of | is similar to the Unix pipe operator, which links different components together, using the output of one component as the input of the next component.

4.1.2 Use LangChain to call Baidu Wenxinyiyan

We can also call the Baidu Wenxin large model through the LangChain framework to integrate the Wenxin model into our application framework.


from dotenv import find_dotenv, load_dotenv
import os

# 读取本地/项目的环境变量。

# find_dotenv()寻找并定位.env文件的路径
# load_dotenv()读取该.env文件，并将其中的环境变量加载到当前的运行环境中
# 如果你设置的是全局的环境变量，这行代码则没有任何作用。
_ = load_dotenv(find_dotenv())

# 获取环境变量 API_KEY
QIANFAN_AK = os.environ["QIANFAN_AK"]
QIANFAN_SK = os.environ["QIANFAN_SK"]

from langchain_community.llms.baidu_qianfan_endpoint import QianfanLLMEndpoint

llm = QianfanLLMEndpoint(streaming=True)
res = llm("你好，请你自我介绍一下！")
print(res)

[WARNING][2025-03-05 19:41:13.652] redis_rate_limiter.py:21 [t:8258539328]: No redis installed, RedisRateLimiter unavailable. Ignore this warning if you don't need to use qianfan SDK in distribution environment
/var/folders/yd/c4q_f88j1l70g7_jcb6pdnb80000gn/T/ipykernel_51550/1209153611.py:4: LangChainDeprecationWarning: The method `BaseLLM.__call__` was deprecated in langchain-core 0.1.7 and will be removed in 1.0. Use :meth:`~invoke` instead.

res = llm("Hello, please introduce yourself!") [ERROR][2025-03-05 19:41:13.835] base.py:134 [t:8258539328]: http request url https://qianfan.baidubce.com/wenxinworkshop/service/list failed with http status code 403 error code from baidu: IamSignatureInvalid error message from baidu: IamSignatureInvalid, cause: Could not find credential. request headers: {'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate, zstd', 'Accept': '/', 'Connection': 'keep-alive', 'Content-Type': 'application/json', 'Host': 'qianfan.baidubce.com', 'request-source': 'qianfan_py_sdk_v0.4.12.3', 'x-bce-date': '2025-03-05T11:41:13Z', 'Authorization': 'bce-auth-v1//2025-03-05T11:41:13Z/300/x-bce-date;host;request-source;content-type/cc383f75803c577d6486841dc228aea994102a4b70bd5ff76f27d12bdb7af133', 'Content-Length': '2'} request body: '{}' response headers: {'Content-Length': '0', 'Date': 'Wed, 05 Mar 2025 11:41:13 GMT', 'X-Bce-Error-Code': 'IamSignatureInvalid', 'X-Bce-Error-Message': 'IamSignatureInvalid, cause: Could not find credential.', 'X-Bce-Exception-Point': 'Gateway', 'X-Bce-Gateway-Region': 'BJ', 'X-Bce-Request-Id': '17059251-201a-4a1f-8fbc-220df83ff184', 'Content-Type': 'text/plain; charset=utf-8'} response body: b'' [WARNING][2025-03-05 19:41:13.835] base.py:1083 [t:8258539328]: fetch_supported_models failed: http request url https://qianfan.baidubce.com/wenxinworkshop/service/list failed with http status code 403 error code from baidu: IamSignatureInvalid error message from baidu: IamSignatureInvalid, cause: Could not find credential. request headers: {'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate, zstd', 'Accept': '/', 'Connection': 'keep-alive', 'Content-Type': 'application/json', 'Host': 'qianfan.baidubce.com', 'request-source': 'qianfan_py_sdk_v0.4.12.3', 'x-bce-date': '2025-03-05T11:41:13Z', 'Authorization': 'bce-auth-v1//2025-03-05T11:41:13Z/300/x-bce-date;host;request-source;content-type/cc383f75803c577d6486841dc228aea994102a4b70bd5ff76f27d12bdb7af133', 'Content-Length': '2'} request body: '{}' response headers: {'Content-Length': '0', 'Date': 'Wed, 05 Mar 2025 11:41:13 GMT', 'X-Bce-Error-Code': 'IamSignatureInvalid', 'X-Bce-Error-Message': 'IamSignatureInvalid, cause: Could not find credential.', 'X-Bce-Exception-Point': 'Gateway', 'X-Bce-Gateway-Region': 'BJ', 'X-Bce-Request-Id': '17059251-201a-4a1f-8fbc-220df83ff184', 'Content-Type': 'text/plain; charset=utf-8'} response body: b'' [ERROR][2025-03-05 19:41:14.033] base.py:134 [t:8258539328]: http request url https://qianfan.baidubce.com/wenxinworkshop/service/list failed with http status code 403 error code from baidu: IamSignatureInvalid error message from baidu: IamSignatureInvalid, cause: Could not find credential. request headers: {'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate, zstd', 'Accept': '/', 'Connection': 'keep-alive', 'Content-Type': 'application/json', 'Host': 'qianfan.baidubce.com', 'request-source': 'qianfan_py_sdk_v0.4.12.3', 'x-bce-date': '2025-03-05T11:41:13Z', 'Authorization': 'bce-auth-v1//2025-03-05T11:41:13Z/300/x-bce-date;host;request-source;content-type/cc383f75803c577d6486841dc228aea994102a4b70bd5ff76f27d12bdb7af133', 'Content-Length': '2'} request body: '{}' response headers: {'Content-Length': '0', 'Date': 'Wed, 05 Mar 2025 11:41:14 GMT', 'X-Bce-Error-Code': 'IamSignatureInvalid', 'X-Bce-Error-Message': 'IamSignatureInvalid, cause: Could not find credential.', 'X-Bce-Exception-Point': 'Gateway', 'X-Bce-Gateway-Region': 'BJ', 'X-Bce-Request-Id': '09088182-5e6e-4725-bd6c-e1f476287b34', 'Content-Type': 'text/plain; charset=utf-8'} response body: b'' [WARNING][2025-03-05 19:41:14.034] base.py:1083 [t:8258539328]: fetch_supported_models failed: http request url https://qianfan.baidubce.com/wenxinworkshop/service/list failed with http status code 403 error code from baidu: IamSignatureInvalid error message from baidu: IamSignatureInvalid, cause: Could not find credential. request headers: {'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate, zstd', 'Accept': '/', 'Connection': 'keep-alive', 'Content-Type': 'application/json', 'Host': 'qianfan.baidubce.com', 'request-source': 'qianfan_py_sdk_v0.4.12.3', 'x-bce-date': '2025-03-05T11:41:13Z', 'Authorization': 'bce-auth-v1//2025-03-05T11:41:13Z/300/x-bce-date;host;request-source;content-type/cc383f75803c577d6486841dc228aea994102a4b70bd5ff76f27d12bdb7af133', 'Content-Length': '2'} request body: '{}' response headers: {'Content-Length': '0', 'Date': 'Wed, 05 Mar 2025 11:41:14 GMT', 'X-Bce-Error-Code': 'IamSignatureInvalid', 'X-Bce-Error-Message': 'IamSignatureInvalid, cause: Could not find credential.', 'X-Bce-Exception-Point': 'Gateway', 'X-Bce-Gateway-Region': 'BJ', 'X-Bce-Request-Id': '09088182-5e6e-4725-bd6c-e1f476287b34', 'Content-Type': 'text/plain; charset=utf-8'} response body: b'' [ERROR][2025-03-05 19:41:14.216] base.py:134 [t:8258539328]: http request url https://qianfan.baidubce.com/wenxinworkshop/service/list failed with http status code 403 error code from baidu: IamSignatureInvalid error message from baidu: IamSignatureInvalid, cause: Could not find credential. request headers: {'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate, zstd', 'Accept': '/', 'Connection': 'keep-alive', 'Content-Type': 'application/json', 'Host': 'qianfan.baidubce.com', 'request-source': 'qianfan_py_sdk_v0.4.12.3', 'x-bce-date': '2025-03-05T11:41:14Z', 'Authorization': 'bce-auth-v1//2025-03-05T11:41:14Z/300/x-bce-date;host;request-source;content-type/34d382a0332f9213819d512ca7cd9bf264d3126e0454764341daa2ed7c9bf1bb', 'Content-Length': '2'} request body: '{}' response headers: {'Content-Length': '0', 'Date': 'Wed, 05 Mar 2025 11:41:14 GMT', 'X-Bce-Error-Code': 'IamSignatureInvalid', 'X-Bce-Error-Message': 'IamSignatureInvalid, cause: Could not find credential.', 'X-Bce-Exception-Point': 'Gateway', 'X-Bce-Gateway-Region': 'BJ', 'X-Bce-Request-Id': '8d60a700-e5c1-42af-93cc-02e817421476', 'Content-Type': 'text/plain; charset=utf-8'} response body: b'' [WARNING][2025-03-05 19:41:14.217] base.py:1083 [t:8258539328]: fetch_supported_models failed: http request url https://qianfan.baidubce.com/wenxinworkshop/service/list failed with http status code 403 error code from baidu: IamSignatureInvalid error message from baidu: IamSignatureInvalid, cause: Could not find credential. request headers: {'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate, zstd', 'Accept': '/', 'Connection': 'keep-alive', 'Content-Type': 'application/json', 'Host': 'qianfan.baidubce.com', 'request-source': 'qianfan_py_sdk_v0.4.12.3', 'x-bce-date': '2025-03-05T11:41:14Z', 'Authorization': 'bce-auth-v1//2025-03-05T11:41:14Z/300/x-bce-date;host;request-source;content-type/34d382a0332f9213819d512ca7cd9bf264d3126e0454764341daa2ed7c9bf1bb', 'Content-Length': '2'} request body: '{}' response headers: {'Content-Length': '0', 'Date': 'Wed, 05 Mar 2025 11:41:14 GMT', 'X-Bce-Error-Code': 'IamSignatureInvalid', 'X-Bce-Error-Message': 'IamSignatureInvalid, cause: Could not find credential.', 'X-Bce-Exception-Point': 'Gateway', 'X-Bce-Gateway-Region': 'BJ', 'X-Bce-Request-Id': '8d60a700-e5c1-42af-93cc-02e817421476', 'Content-Type': 'text/plain; charset=utf-8'} response body: b'' [INFO][2025-03-05 19:41:14.219] oauth.py:277 [t:8258539328]: trying to refresh token for ak 6hM0ZG*** [INFO][2025-03-05 19:41:14.340] oauth.py:304 [t:8258539328]: successfully refresh token

Hello! I am an artificial intelligence language model, and my name is Wen Xinyiyan. I am able to interact with people in natural language and provide a variety of information and services. If you have any questions or need help, please feel free to let me know and I will try my best to help you.

4.1.3 iFlytek Spark

We can also call the iFlytek Spark model through the LangChain framework. For more information, please refer to SparkLLM

We hope to store the secret key directly in the .env file like calling ChatGPT and load it into an environment variable, thereby hiding the specific details of the secret key and ensuring security. Therefore, we need to configure it in the .env fileIFLYTEK_SPARK_APP_ID、 IFLYTEK_SPARK_API_KEYandIFLYTEK_SPARK_API_SECRET, and loaded using the following code:

from dotenv import find_dotenv, load_dotenv
import os

# 读取本地/项目的环境变量。

# find_dotenv()寻找并定位.env文件的路径
# load_dotenv()读取该.env文件，并将其中的环境变量加载到当前的运行环境中
# 如果你设置的是全局的环境变量，这行代码则没有任何作用。
_ = load_dotenv(find_dotenv())

# 获取环境变量 API_KEY
IFLYTEK_SPARK_APP_ID = os.environ["IFLYTEK_SPARK_APP_ID"]
IFLYTEK_SPARK_API_KEY = os.environ["IFLYTEK_SPARK_API_KEY"]
IFLYTEK_SPARK_API_SECRET = os.environ["IFLYTEK_SPARK_API_SECRET"]

In addition, each model of Spark corresponds tospark_api_urlandspark_llm_domainThey are all different, you can refer to 接口说明 to select the call.

from langchain_community.llms.sparkllm import SparkLLM

# Load the model
llm = SparkLLM(
    model='Spark4.0 Ultra',
    app_id=IFLYTEK_SPARK_APP_ID,
    api_key=IFLYTEK_SPARK_API_KEY,
    api_secret=IFLYTEK_SPARK_API_SECRET,
    spark_api_url="wss://spark-api.xf-yun.com/v4.0/chat",
    spark_llm_domain="4.0Ultra"
    )

res = llm.invoke("你好，请你自我介绍一下！")
print(res)

Hello, I am a cognitive intelligence model developed by iFlytek. My name is iFlytek Spark Cognitive Model. I can communicate naturally with humans, answer questions, and efficiently complete cognitive intelligence needs in various fields.

Therefore, we can add the Spark large model to the LangChain architecture to realize the call of the Wenxin large model in the application.

4.1.4 Use LangChain to call GLM

We can also call the smart spectrum AI large model through the LangChain framework to connect it to our application framework. Since the ChatGLM provided in langchain is no longer available, we need to customize a LLM.

If you are using the Zhipuai GLM API, you need to download our encapsulated code [zhipuai_llm.py] to the same directory of this Notebook before you can run the following code to use GLM in LangChain.

According to the official announcement of Zhipu, the following models will be deprecated. After these models are deprecated, they will be automatically routed to new models. Users are requested to update your model coding to the latest version before the deprecation date to ensure a smooth transition of services. For more model-related information, please visit model

Model encoding	Deprecation date	Point to model
chatglm_pro	December 31, 2024	glm-4
chatglm_std	December 31, 2024	glm-3-turbo
chatglm_lite	December 31, 2024	glm-3-turbo

from zhipuai_llm import ZhipuaiLLM
from dotenv import find_dotenv, load_dotenv
import os

# 读取本地/项目的环境变量。

# find_dotenv()寻找并定位.env文件的路径
# load_dotenv()读取该.env文件，并将其中的环境变量加载到当前的运行环境中
# 如果你设置的是全局的环境变量，这行代码则没有任何作用。
_ = load_dotenv(find_dotenv())

# 获取环境变量 API_KEY
api_key = os.environ["ZHIPUAI_API_KEY"] #填写控制台中获取的 APIKey 信息

zhipuai_model = ZhipuaiLLM(model_name="glm-4-plus", temperature=0.1, api_key=api_key)

zhipuai_model.invoke("你好，请你自我介绍一下！")

AIMessage(content='Hello! I am the artificial intelligence assistant ChatGLM, which is developed based on the language model trained by ChatGLM in 2024. My task is to provide appropriate responses and support to users' questions and requests.', additional_kwargs={}, response_metadata={'time_in_seconds': 1.87}, id='run-4e509a7e-9859-4acb-9418-23245fa5b7a7-0', usage_metadata={'input_tokens': 11, 'output_tokens': 42, 'total_tokens': 53})

4.2 Build a search question and answer chain

existC3 搭建数据库Chapter, we have introduced how to build a vector knowledge base based on our own local knowledge documents. In the following content, we will use the built vector database to recall the query query, combine the recall results with the query to build a prompt, and input it into the large model for question and answer.

4.2.1 Load vector database

First, we load the vector database we built in the previous chapter. Note that you need to use the same Emedding here as when building.

import sys
sys.path.append("../C3 搭建知识库") # 将父目录放入系统路径中

# 使用智谱 Embedding API，注意，需要将上一章实现的封装代码下载到本地
from zhipuai_embedding import ZhipuAIEmbeddings

from langchain.vectorstores.chroma import Chroma

Load your API_KEY from environment variable

from dotenv import load_dotenv, find_dotenv
import os

_ = load_dotenv(find_dotenv())    # read local .env file
zhipuai_api_key = os.environ['ZHIPUAI_API_KEY']

Load the vector database, which contains the Embedding of multiple documents under ../../data_base/knowledge_db

# 定义 Embeddings
embedding = ZhipuAIEmbeddings()

# 向量数据库持久化路径
persist_directory = '../../data_base/vector_db/chroma'

# 加载数据库
vectordb = Chroma(
    persist_directory=persist_directory,  # 允许我们将persist_directory目录保存到磁盘上
    embedding_function=embedding
)

/var/folders/yd/c4q_f88j1l70g7_jcb6pdnb80000gn/T/ipykernel_17468/4214663976.py:8: LangChainDeprecationWarning: The class `Chroma` was deprecated in LangChain 0.2.9 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-chroma package and should be used instead. To use it run `pip install -U :class:`~langchain-chroma` and import as `from :class:`~langchain_chroma import Chroma``.
  vectordb = Chroma(

print(f"向量库中存储的数量：{vectordb._collection.count()}")

Number stored in vector library: 1004

We can test the loaded vector database byas_retrieverMethod constructs a vector database into a retriever. We use a question query for vector retrieval. The following code will search based on similarity in the vector database and return the top k most similar documents.

question = "什么是prompt engineering?"
retriever = vectordb.as_retriever(search_kwargs={"k": 3})
docs = retriever.invoke(question)
print(f"检索到的内容数：{len(docs)}")

Number of items retrieved: 3

Print the retrieved content

for i, doc in enumerate(docs):
    print(f"检索到的第{i}个内容: \n {doc.page_content}", end="\n-----------------------------------------------------\n")

The 0th content retrieved: Specifically, a first version of Prompt is written first, and then gradually improved through multiple rounds of adjustments until satisfactory results are produced. For more complex applications, iterative training can be performed on multiple samples to evaluate the average performance of Prompt. After the application becomes more mature, it is necessary to conduct detailed optimization by evaluating Prompt performance on multiple sample sets. Because this requires higher computing resources.

In short, the core of Prompt engineers is to master the iterative development and optimization skills of Prompt, rather than requiring 100% perfection from the beginning. The correct way to design Prompt is to finally find a reliable and applicable Prompt form through constant adjustment and trial and error.

Readers can practice the examples given in this chapter on Jupyter Notebook, modify Prompt and observe different outputs to gain a deeper understanding of the iterative optimization process of Prompt. This will provide good practical preparation for further development of complex language model applications.

English version

Product manual ----------------------------------------------------- The first content retrieved: Chapter 1 Introduction

Welcome to the Prompt Engineering for Developers section. The content of this section is based on the "Prompt Engineering for Developer" course taught by Andrew Ng. The "Prompt Engineering for Developer" course is taught by Mr. Ng Enda in cooperation with Mr. Isa Fulford, a member of the OpenAI technical team. Mr. Isa has developed the popular ChatGPT search plug-in and has made great contributions in teaching the application of LLM (Large Language Model) technology in products. She also co-wrote the OpenAI cookbook that teaches people to use Prompt. We hope that through studying this module, we can share with you the best practices and techniques for developing LLM applications using prompt words. ----------------------------------------------------- The 2nd content retrieved: Chapter 2 Prompt Principles

How to use Prompt to give full play to the performance of LLM? First of all, we need to know the principles of designing Prompt. They are the basic concepts that every developer must know when designing Prompt. This chapter discusses two key principles for designing effective prompts: writing clear, specific instructions and giving the model enough time to think. Mastering these two points is particularly important for creating reliable language model interactions.

First, Prompt needs to clearly express the requirements and provide sufficient context so that the language model accurately understands our intentions, just like explaining the human world to an alien in detail. Too simple Prompt often makes it difficult for the model to grasp the specific tasks to be completed.

Secondly, it is also critical to allow the language model enough time to reason. Just like when humans solve problems, hasty conclusions often lead to mistakes. Therefore, Prompt should add the requirement of step-by-step reasoning and allow sufficient thinking time for the model, so that the generated results will be more accurate and reliable.

If Prompt is optimized on both points, the language model can maximize its potential and complete complex reasoning and generation tasks. Mastering these Prompt design principles is an important step for developers to successfully apply language models.

Principle 1: Write clear and specific instructions

4.2.2 Create retrieval chain

We can use LangChain's LCEL (LangChain Expression Language, LangChain Expression Language) to build workflow. LCEL can support asynchronous (ainvoke), streaming (stream), batch processing (batch) and other operating modes, and can also use LangSmith for seamless tracking.

Next we define a simple retrieval chain using the retriever just defined.

from langchain_core.runnables import RunnableLambda
def combine_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

combiner = RunnableLambda(combine_docs)
retrieval_chain = retriever | combiner

retrieval_chain.invoke("南瓜书是什么？")

'Preface\n "Machine Learning" (Xigua Book) by Teacher Zhou Zhihua is one of the classic introductory textbooks in the field of machine learning. In order to enable as many readers as possible\n to understand machine learning through Xigua Book, Teacher Zhou Therefore, the derivation details of some formulas are not detailed in the book, but this may be "unfriendly" to readers who want to delve into the details of formula derivation\n. This book aims to analyze the more difficult to understand formulas in the Xigua book and add specific derivation details to some formulas. "\nAfter reading this, you may wonder why the previous paragraph is here. We added quotation marks because this was just our initial reverie. Later we learned that the real reason why Teacher Zhou omitted these derivation details was that he believed that "sophomore students with a solid foundation in science and engineering mathematics should have no difficulty with the derivation details in Xigua Shu\n. The key points are all in the book, and the omitted details should be able to make up for them in their heads or do exercises." So... this pumpkin book can only be regarded as the notes that I\nother math bastards took down when they were studying on their own. I hope it can help everyone become a qualified "sophomore student with a solid foundation in mathematics in science and engineering." \nInstructions for use\n• All contents of the Pumpkin Book are expressed based on the content of the Watermelon Book as pre-knowledge, so the best way to use the Pumpkin Book is to use the Watermelon Book\n as the main line. When you encounter formulas that you cannot derive or understand, please refer to the Pumpkin Book;\n• For beginners who are new to machine learning, it is strongly not recommended to go into the formulas in Chapters 1 and 2 of the Watermelon Book. Just go through it briefly and wait until you learn the latest version of PDF\n\n Access address: https://github.com/datawhalechina/pumpkin-book/releases\n编委会\n主编：Sm1les、archwalker、jbb0523\n编委：juxiao、Majingmin、MrBigFan、shanry、Ye980226\n封面设计：构思-Sm1les、创作-林王茂盛\n致谢\n特别感谢awyd234、feijuan、Ggmatch、Heitao5200、huaqing89、LongJH、LilRachel、LeoLRH、Nono17、\nspareribs、sunchaothu、StevenLzq Contributions to the Pumpkin Book from its earliest days. \nScan the QR code below and reply with the keyword "Pumpkin Book" to join the "Pumpkin Book Readers Exchange Group"\nCopyright Statement\nThis work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. \n\n• For newcomers to machine learning, it is strongly not recommended to delve into the formulas in Chapters 1 and 2 of Xigua Book. Just go through it briefly. You can come back to it when you have learned it\n;\n• We will strive (zhi) for the analysis and derivation of each formula (neng) It is explained from the perspective of the basics of undergraduate mathematics, so super-curricular mathematical knowledge\nWe usually give it in the form of appendices and references. Interested students can continue to study in depth along the information we provide;\n• If there is no formula you want to check in the Pumpkin Book, or you find an error somewhere in the Pumpkin Book, please do not hesitate to go to our GitHub\nIssues (Address: https://github.com/datawhalechina/pumpkin-book/issues）进行反馈，在对应版块\n提交你希望补充的公式编号或者勘误信息，我们通常会在24 We will reply to you within 24 hours, more than 24 hours) If there is no reply within an hour\nYou can contact us via WeChat (WeChat ID: at-Sm1les);\nSupporting video tutorial: https://www.bilibili.com/video/BV1Mh411e7VU\n在线阅读地址：https://datawhalechina.github.io/pumpkin-book（仅供第1 version)'

LCEL requires that all constituent elements areRunnableType, as we saw earlierChatModel、PromptTemplateetc. are all inherited fromRunnablekind. aboveretrieval_chainis determined by the retrieverretrieverand combinercombinercomposed of|Symbol concatenation, data is passed from left to right, that is, the problem is firstretrieverSearch to get the search results, and thencombinerfurther processed and output.

4.2.3 Create LLM

Here, we call OpenAI’s API to create an LLM. Of course, you can also use other LLM’s APIs to create it.

import os 
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)

llm.invoke("请你自我介绍一下自己！").content

'sure! I am an artificial intelligence assistant developed by OpenAI called ChatGPT. I specialize in processing and generating natural language text to help answer questions, provide information, assist with problem solving, and conduct a variety of conversations. I have no personal experiences or emotions, but I will try to provide accurate and helpful answers. If you have any questions or need help, feel free to ask me! '

4.2.4 Constructing a search question and answer chain

from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_core.output_parsers import StrOutputParser

template = """使用以下上下文来回答最后的问题。如果你不知道答案，就说你不知道，不要试图编造答
案。最多使用三句话。尽量使答案简明扼要。请你在回答的最后说“谢谢你的提问！”。
{context}
问题: {input}
"""
# 将template通过 PromptTemplate 转为可以在LCEL中使用的类型
prompt = PromptTemplate(template=template)

qa_chain = (
    RunnableParallel({"context": retrieval_chain, "input": RunnablePassthrough()})
    | prompt
    | llm
    | StrOutputParser()
)

In the above code, we regard the retrieval chain just defined as a sub-chain aspromptofcontext, then useRunnablePassthroughStore user's questions aspromptofinput. And because these two operations are parallel, we useRunnableParallelto run them in parallel.

Search question and answer chain effect test

question_1 = "什么是南瓜书？"
question_2 = "Prompt Engineering for Developer是谁写的？"

result = qa_chain.invoke(question_1)
print("大模型+知识库后回答 question_1 的结果：")
print(result)

The result of answering question_1 after large model + knowledge base: Pumpkin Book is a book that analyzes the more difficult to understand formulas in "Machine Learning" (Watermelon Book) and supplements the derivation details. It aims to help readers better understand the mathematical derivation in machine learning. It uses the content of the Xigua Book as pre-knowledge and is suitable for reference when encountering derivation difficulties. The goal of the Pumpkin Book is to help readers become "sophomore students with a solid foundation in science, engineering, and mathematics." Thank you for your question!

result = qa_chain.invoke(question_2)
print("大模型+知识库后回答 question_2 的结果：")
print(result)

The result of answering question_2 after large model + knowledge base: The "Prompt Engineering for Developer" course is taught by Mr. Andrew Ng in collaboration with Isa Fulford, a member of the OpenAI technical team. Thank you for your question!

The effect of the large model answering by itself

llm.invoke(question_1).content

'The Pumpkin Book usually refers to the book "Deep Learning: Algorithms and Implementation" because the cover of the book is orange and looks like a pumpkin. This book, written by Li Mu, Aston Zhang, Zachary C. Lipton, and Alexander J. Smola, mainly introduces the basic knowledge and practical methods of deep learning. The book covers the basic concepts, commonly used models and algorithms of deep learning, and provides a large number of code examples to help readers understand and implement deep learning technology. Due to its detailed and practical content, the Pumpkin Book is widely popular among learners in the field of deep learning. '

llm.invoke(question_2).content

'Prompt Engineering for Developers' is a book co-authored by Isa Fulford and Andrew Ng. This book aims to help developers better understand and apply prompt engineering technology to improve the efficiency and effectiveness of interacting with large language models (such as GPT-3). '

⭐ Through the above two questions, we found that LLM did not answer very well for some knowledge in recent years and non-common knowledge professional questions. And that, coupled with our local knowledge, can help LLM come up with better answers. In addition, it also helps alleviate the "illusion" problem of large models.

4.2.5 Add chat records to the search chain

Now that we have achieved this by uploading local knowledge documents and then saving them to the vector knowledge base, by combining the query questions with the recall results of the vector knowledge base and inputting them into the LLM, we have obtained a much better result than letting the LLM answer directly. When interacting with language models, you may have noticed a key problem - they don't remember your previous communications. This creates a big challenge when we build some applications (such as chatbots), making the conversation seem to lack real continuity. How to solve this problem?

Transfer chat history

In this section we will use LangChainChatPromptTemplate, that is, embedding previous conversations into the language model to give it the ability to continue conversations.ChatPromptTemplateChat message history can be received, which will be passed to the chatbot along with the questions when answering them, adding them to the context.

from langchain_core.prompts import ChatPromptTemplate

# 问答链的系统prompt
system_prompt = (
    "你是一个问答任务的助手。 "
    "请使用检索到的上下文片段回答这个问题。 "
    "如果你不知道答案就说不知道。 "
    "请使用简洁的话语回答用户。"
    "\n\n"
    "{context}"
)
# 制定prompt template
qa_prompt = ChatPromptTemplate(
    [
        ("system", system_prompt),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
    ]
)

# 无历史记录
messages = qa_prompt.invoke(
    {
        "input": "南瓜书是什么？",
        "chat_history": [],
        "context": ""
    }
)
for message in messages.messages:
    print(message.content)

You are an assistant on a question and answer task. Please answer this question using the retrieved context fragment. If you don't know the answer just say no. Please use concise words to answer users.

What is a pumpkin book?

# 有历史记录
messages = qa_prompt.invoke(
    {
        "input": "你可以介绍一下他吗？",
        "chat_history": [
            ("human", "西瓜书是什么？"),
            ("ai", "西瓜书是指周志华老师的《机器学习》一书，是机器学习领域的经典入门教材之一。"),
        ],
        "context": ""
    }
)
for message in messages.messages:
    print(message.content)

What is the Watermelon Book? The Xigua Book refers to the book "Machine Learning" by teacher Zhou Zhihua, which is one of the classic introductory textbooks in the field of machine learning. Can you introduce him?

4.2.6 Retrieval chain with information compression

Because the question and answer chain we are building has the function of supporting multiple rounds of dialogue, compared with the question and answer chain of a single round of dialogue, it will face more problems like the output results above, that is, the user's latest dialogue semantics are incomplete, and it is difficult to retrieve relevant information when using the user question query vector database. Like "Can you introduce him?" above, it actually means "Can you introduce Mr. Zhou Zhihua?" In order to solve this problem, we will adopt information compression method and let llm improve the user's problem based on historical records.

from langchain_core.runnables import RunnableBranch

# 压缩问题的系统 prompt
condense_question_system_template = (
    "请根据聊天记录完善用户最新的问题，"
    "如果用户最新的问题不需要完善则返回用户的问题。"
    )
# 构造 压缩问题的 prompt template
condense_question_prompt = ChatPromptTemplate([
        ("system", condense_question_system_template),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
    ])
# 构造检索文档的链
# RunnableBranch 会根据条件选择要运行的分支
retrieve_docs = RunnableBranch(
    # 分支 1: 若聊天记录中没有 chat_history 则直接使用用户问题查询向量数据库
    (lambda x: not x.get("chat_history", False), (lambda x: x["input"]) | retriever, ),
    # 分支 2 : 若聊天记录中有 chat_history 则先让 llm 根据聊天记录完善问题再查询向量数据库
    condense_question_prompt | llm | StrOutputParser() | retriever,
)

Supports search Q&A chain of chat records

Here we use the question and answer template defined beforeqa_promptConstruct a question and answer chain, and we passRunnablePassthrough.assignSave the intermediate query results as"context", save the final result as"answer". Because query results are stored as"context", so we integrate the function of query resultscombine_docsCorresponding changes must also be made.

# 重新定义 combine_docs
def combine_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs["context"]) # 将 docs 改为 docs["context"]
# 定义问答链
qa_chain = (
    RunnablePassthrough.assign(context=combine_docs) # 使用 combine_docs 函数整合 qa_prompt 中的 context
    | qa_prompt # 问答模板
    | llm
    | StrOutputParser() # 规定输出的格式为 str
)
# 定义带有历史记录的问答链
qa_history_chain = RunnablePassthrough.assign(
    context = (lambda x: x) | retrieve_docs # 将查询结果存为 content
    ).assign(answer=qa_chain) # 将最终结果存为 answer

Test retrieval question and answer chain

# 不带聊天记录
qa_history_chain.invoke({
    "input": "西瓜书是什么？",
    "chat_history": []
})

{'input': 'What is the Watermelon Book? ', 'chat_history': [], 'context': [Document(metadata={'author': '', 'creationDate': "D:20230303170709-00'00'", 'creator': 'LaTeX with hyperref', 'file_path': '../../data_base/knowledge_db/pumkin_book/pumpkin_book.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': '', 'page': 1, 'producer': 'xdvipdfmx (20200315)', 'source': '../../data_base/knowledge_db/pumkin_book/pumpkin_book.pdf', 'subject': '', 'title': '', 'total_pages': 196, 'trapped': ''}, page_content='Foreword\n"Teacher Zhou Zhihua's "Machine Learning" (Xigua Book) is one of the classic introductory textbooks in the field of machine learning. In order to enable as many readers as possible\n to understand machine learning through Xigua Book, Teacher Zhou Therefore, the derivation details of some formulas are not detailed in the book, but this may be "unfriendly" to readers who want to delve into the details of formula derivation\n. This book aims to analyze the more difficult to understand formulas in the Xigua book and add specific derivation details to some formulas. "\nAfter reading this, you may wonder why the quotation marks are added to the previous paragraph. , because this was just our initial reverie, but later we learned that the real reason why Teacher Zhou omitted these derivation details was that he believed that "sophomore students with a solid foundation in science and engineering mathematics should have no difficulty with the derivation details in the Xigua book. The key points are all in the book, and the omitted details should be able to make up for it in their heads or practice." So... This Pumpkin Book can only be regarded as the notes that I, a math bastard, took down during my self-study. I hope it can help everyone become a qualified "sophomore student with a solid foundation in science and engineering mathematics." \nInstructions for use\n• All contents of the Pumpkin Book are expressed with the content of the Watermelon Book as prerequisite knowledge, so the best way to use the Pumpkin Book is to use the Watermelon Book as the main line, and refer to the Pumpkin Book when you encounter formulas that you cannot derive or understand;\n• For beginners who are new to machine learning, it is strongly not recommended to study the formulas in Chapters 1 and 2 of Xigua Book. You can simply go through it and wait until you learn it'), Document(metadata={'author': '', 'creationDate': "D:20230303170709-00'00'", 'creator': 'LaTeX with hyperref', 'file_path': '../../data_base/knowledge_db/pumkin_book/pumpkin_book.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': '', 'page': 161, 'producer': 'xdvipdfmx (20200315)', 'source': '../../data_base/knowledge_db/pumkin_book/pumpkin_book.pdf', 'subject': '', 'title': '', 'total_pages': 196, 'trapped': ''}, page_content='For the analysis of concepts such as "error", "loss" and "risk", please refer to the notes in Chapter 2, Section 2.1 of "Xigua Book"\n→→\nSupporting video tutorial: https://www.bilibili.com/video/BV1Mh411e7VU\n←←'), Document(metadata={'author': '', 'creationDate': "D:20230303170709-00'00'", 'creator': 'LaTeX with hyperref', 'file_path': '../../data_base/knowledge_db/pumkin_book/pumpkin_book.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': '', 'page': 1, 'producer': 'xdvipdfmx (20200315)', 'source': '../../data_base/knowledge_db/pumkin_book/pumpkin_book.pdf', 'subject': '', 'title': '', 'total_pages': 196, 'trapped': ''}, page_content='• For beginners who are new to machine learning, it is strongly not recommended to study the formulas in Chapters 1 and 2 of Xigua Book. Just go through them briefly. You can come back to them when you have learned\n a little bit;\n• We will strive (zhi) for the analysis and derivation of each formula (neng) It is explained from the perspective of the basics of undergraduate mathematics, so super-curricular mathematical knowledge\nWe usually give it in the form of appendices and references. Interested students can continue to study in depth along the information we provide;\n• If there is no formula you want to check in the Pumpkin Book, or you find an error somewhere in the Pumpkin Book, please do not hesitate to go to our GitHub\nIssues (Address: https://github.com/datawhalechina/pumpkin-book/issues）进行反馈，在对应版块\n提交你希望补充的公式编号或者勘误信息，我们通常会在24 We will reply to you within 24 hours, more than 24 hours) If there is no reply within an hour\nYou can contact us via WeChat (WeChat ID: at-Sm1les);\nSupporting video tutorial: https://www.bilibili.com/video/BV1Mh411e7VU\n在线阅读地址：https://datawhalechina.github.io/pumpkin-book（仅供第1 version)')], 'answer': 'Xigua Book refers to the book "Machine Learning" by teacher Zhou Zhihua. It is one of the classic introductory textbooks in the field of machine learning. '}

# 带聊天记录
qa_history_chain.invoke({
    "input": "南瓜书跟它有什么关系？",
    "chat_history": [
        ("human", "西瓜书是什么？"),
        ("ai", "西瓜书是指周志华老师的《机器学习》一书，是机器学习领域的经典入门教材之一。"),
    ]
})

{'input': 'What does the pumpkin book have to do with it? ', 'chat_history': [('human', 'What is the Watermelon Book?'), ('ai', 'Xigua Book refers to the book "Machine Learning" by Teacher Zhou Zhihua, which is one of the classic introductory textbooks in the field of machine learning.')], 'context': [Document(metadata={'author': '', 'creationDate': "D:20230303170709-00'00'", 'creator': 'LaTeX with hyperref', 'file_path': '../../data_base/knowledge_db/pumkin_book/pumpkin_book.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': '', 'page': 1, 'producer': 'xdvipdfmx (20200315)', 'source': '../../data_base/knowledge_db/pumkin_book/pumpkin_book.pdf', 'subject': '', 'title': '', 'total_pages': 196, 'trapped': ''}, page_content='Foreword\n"Teacher Zhou Zhihua's "Machine Learning" (Xigua Book) is one of the classic introductory textbooks in the field of machine learning. In order to enable as many readers as possible\n to understand machine learning through Xigua Book, Teacher Zhou Therefore, the derivation details of some formulas are not detailed in the book, but this may be "unfriendly" to readers who want to delve into the details of formula derivation\n. This book aims to analyze the more difficult to understand formulas in the Xigua book and add specific derivation details to some formulas. "\nAfter reading this, you may wonder why the quotation marks are added to the previous paragraph. , because this was just our initial reverie, but later we learned that the real reason why Teacher Zhou omitted these derivation details was that he believed that "sophomore students with a solid foundation in science and engineering mathematics should have no difficulty with the derivation details in the Xigua book. The key points are all in the book, and the omitted details should be able to make up for it in their heads or practice." So... This Pumpkin Book can only be regarded as the notes that I, a math bastard, took down during my self-study. I hope it can help everyone become a qualified "sophomore student with a solid foundation in science and engineering mathematics." \nInstructions for use\n• All contents of the Pumpkin Book are expressed with the content of the Watermelon Book as prerequisite knowledge, so the best way to use the Pumpkin Book is to use the Watermelon Book as the main line, and refer to the Pumpkin Book when you encounter formulas that you cannot derive or understand;\n• For beginners who are new to machine learning, it is strongly not recommended to study the formulas in Chapters 1 and 2 of Xigua Book. You can simply go through it and wait until you learn it'), Document(metadata={'author': '', 'creationDate': "D:20230303170709-00'00'", 'creator': 'LaTeX with hyperref', 'file_path': '../../data_base/knowledge_db/pumkin_book/pumpkin_book.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': '', 'page': 13, 'producer': 'xdvipdfmx (20200315)', 'source': '../../data_base/knowledge_db/pumkin_book/pumpkin_book.pdf', 'subject': '', 'title': '', 'total_pages': 196, 'trapped': ''}, page_content='→→\nWelcome to purchase the paper version of the pumpkin book "Detailed Explanation of Machine Learning Formulas" on major e-commerce platforms\n←←\nNo. 1 Chapter\nIntroduction\nThis chapter is the beginning of the "Xigua Book". It mainly explains what machine learning is and the related mathematical symbols of machine learning, which will pave the way for the subsequent content. It does not involve complex algorithm theory, so when reading this chapter, you only need to patiently sort out all the concepts and mathematical symbols. In addition, it is recommended to read the West before reading this chapter. The "Main Symbol Table" on the front page of the "Water Melon Book" can answer most of the doubts about mathematical symbols that arise during the reading of "The Water Melon Book". \n This chapter is also the beginning of this book. The author will elaborate on the original intention of writing this book. This book aims to accompany readers to read "Water Melon Book" from the perspective of an "experienced person" and try to help readers eliminate reading problems. As long as the reader has studied "Advanced Mathematics", "Linear Algebra\n" and "Probability Theory and Mathematical Statistics", which are three compulsory mathematics courses in universities, they can understand the explanation and derivation of the formulas in Xigua's book. At the same time, they can also appreciate the "beauty of mathematics" produced by the collision of these three mathematics courses in machine learning. .\n1.1\nIntroduction\nThis section focuses on conceptual understanding. Here is a supplementary explanation of "algorithm" and "model". "Algorithm" refers to the specific method of learning "model" from data, such as linear regression, logarithmic probability regression, decision tree, etc. that will be described in subsequent chapters. '), Document(metadata={'author': '', 'creationDate': "D:20230303170709-00'00'", 'creator': 'LaTeX with hyperref', 'file_path': '../../data_base/knowledge_db/pumkin_book/pumpkin_book.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': '', 'page': 1, 'producer': 'xdvipdfmx (20200315)', 'source': '../../data_base/knowledge_db/pumkin_book/pumpkin_book.pdf', 'subject': '', 'title': '', 'total_pages': 196, 'trapped': ''}, page_content='• For beginners who are new to machine learning, it is strongly not recommended to study the formulas in Chapters 1 and 2 of Xigua Book. Just go through them briefly. You can come back to them when you have learned\n a little bit;\n• We will strive (zhi) for the analysis and derivation of each formula (neng) It is explained from the perspective of the basics of undergraduate mathematics, so super-curricular mathematical knowledge\nWe usually give it in the form of appendices and references. Interested students can continue to study in depth along the information we provide;\n• If there is no formula you want to check in the Pumpkin Book, or you find an error somewhere in the Pumpkin Book, please do not hesitate to go to our GitHub\nIssues (Address: https://github.com/datawhalechina/pumpkin-book/issues）进行反馈，在对应版块\n提交你希望补充的公式编号或者勘误信息，我们通常会在24 We will reply to you within 24 hours, more than 24 hours) If there is no reply within an hour\nYou can contact us via WeChat (WeChat ID: at-Sm1les);\nSupporting video tutorial: https://www.bilibili.com/video/BV1Mh411e7VU\n在线阅读地址：https://datawhalechina.github.io/pumpkin-book（仅供第1 version)')], 'answer': 'The Pumpkin Book is a book that provides detailed analysis and derivation of formulas that are difficult to understand in the Watermelon Book. It uses the content of Xigua Book as pre-knowledge to help readers better understand and learn the content of Xigua Book. '}

It can be seen that LLM accurately determines what "it" is, which means that we have successfully conveyed historical information to it. In addition, the recalled content also has answers to the questions, proving that our information compression strategy also works. This ability to correlate previous and previous questions and compress and retrieve information can greatly enhance the continuity and intelligence of the question and answer system.

4.3 Deploy Knowledge Base Assistant

Now that we have a basic understanding of knowledge bases and LLM, it’s time to blend them neatly and create a visually rich interface. Such an interface is not only easier to operate, but also easier to share with others.

Streamlit is a fast and convenient way to demonstrate machine learning models directly in Python through a friendly web interface. In this course, we'll learn how to use it to build user interfaces for generative AI applications. After building a machine learning model, if you want to build a demo to show others, maybe to get feedback and drive improvements to the system, or just because you think the system is cool and want to demonstrate it: Streamlit allows you to quickly achieve this through a Python interface program without writing any front-end, web or JavaScript code.

4.3.1 Introduction to Streamlit

Streamlitis an open source Python library for quickly creating data applications. It is designed to allow data scientists to easily transform data analysis and machine learning models into interactive web applications without requiring in-depth knowledge of web development. The difference from regular web frameworks, such as Flask/django, is that it does not require you to write any client code (HTML/CSS/JS). You only need to write ordinary Python modules to create a beautiful and highly interactive interface in a short time, thereby quickly generating data analysis or machine learning results. On the other hand, unlike those tools that can only be generated by dragging and dropping, you still have complete control over the code.

Streamlit provides a simple yet powerful set of basic modules for building data applications:

st.write(): This is one of the most basic modules used to render text, images, tables, etc. in the application.
st.title(), st.header(), st.subheader(): These modules are used to add titles, subtitles, and grouped titles to organize the layout of the application.
st.text(), st.markdown(): used to add text content, supporting Markdown syntax.
st.image(): used to add images to the application.
st.dataframe(): used to render Pandas data frame.
st.table(): used to render simple data tables.
st.pyplot(), st.altair_chart(), st.plotly_chart(): used to render charts drawn by Matplotlib, Altair or Plotly.
st.selectbox(), st.multiselect(), st.slider(), st.text_input(): used to add interactive widgets that allow users to select, enter, or slide in the application.
st.button(), st.checkbox(), st.radio(): used to add buttons, checkboxes and radio buttons to trigger specific actions.

These basic modules make it easy to build interactive data applications with Streamlit, and can be combined and customized as needed. For more information, see 官方文档

4.3.2 Building the application

First, create a new Python file and save it streamlit_app.py in the root of your working directory

Import the necessary Python libraries.

import streamlit as st
from langchain_openai import ChatOpenAI
import os
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableBranch, RunnablePassthrough
import sys
sys.path.append("notebook/C3 搭建知识库") # 将父目录放入系统路径中
from zhipuai_embedding import ZhipuAIEmbeddings
from langchain_community.vectorstores import Chroma

Definitionget_retrieverfunction that returns a retriever

def get_retriever():
    # 定义 Embeddings
    embedding = ZhipuAIEmbeddings()
    # 向量数据库持久化路径
    persist_directory = 'data_base/vector_db/chroma'
    # 加载数据库
    vectordb = Chroma(
        persist_directory=persist_directory,
        embedding_function=embedding
    )
    return vectordb.as_retriever()

Definitioncombine_docsFunction that processes the text returned by the retriever

def combine_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs["context"])

Definitionget_qa_history_chainfunction, which can return a search question and answer chain

def get_qa_history_chain():
    retriever = get_retriever()
    llm = ChatOpenAI(model_name="gpt-4o", temperature=0)
    condense_question_system_template = (
        "请根据聊天记录总结用户最近的问题，"
        "如果没有多余的聊天记录则返回用户的问题。"
    )
    condense_question_prompt = ChatPromptTemplate([
            ("system", condense_question_system_template),
            ("placeholder", "{chat_history}"),
            ("human", "{input}"),
        ])

    retrieve_docs = RunnableBranch(
        (lambda x: not x.get("chat_history", False), (lambda x: x["input"]) | retriever, ),
        condense_question_prompt | llm | StrOutputParser() | retriever,
    )

    system_prompt = (
        "你是一个问答任务的助手。 "
        "请使用检索到的上下文片段回答这个问题。 "
        "如果你不知道答案就说不知道。 "
        "请使用简洁的话语回答用户。"
        "\n\n"
        "{context}"
    )
    qa_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            ("placeholder", "{chat_history}"),
            ("human", "{input}"),
        ]
    )
    qa_chain = (
        RunnablePassthrough().assign(context=combine_docs)
        | qa_prompt
        | llm
        | StrOutputParser()
    )

    qa_history_chain = RunnablePassthrough().assign(
        context = retrieve_docs, 
        ).assign(answer=qa_chain)
    return qa_history_chain

Definitiongen_responseFunction, which accepts the retrieval question and answer chain, user input and chat history, and returns the chain output in a streaming format

def gen_response(chain, input, chat_history):
    response = chain.stream({
        "input": input,
        "chat_history": chat_history
    })
    for res in response:
        if "answer" in res.keys():
            yield res["answer"]

Define the main function, which formulates the display effect and logic

def main():
    st.markdown('### 🦜🔗 动手学大模型应用开发')
    # st.session_state可以存储用户与应用交互期间的状态与数据
    # 存储对话历史
    if "messages" not in st.session_state:
        st.session_state.messages = []
    # 存储检索问答链
    if "qa_history_chain" not in st.session_state:
        st.session_state.qa_history_chain = get_qa_history_chain()
    # 建立容器 高度为500 px
    messages = st.container(height=550)
    # 显示整个对话历史
    for message in st.session_state.messages: # 遍历对话历史
            with messages.chat_message(message[0]): # messages指在容器下显示，chat_message显示用户及ai头像
                st.write(message[1]) # 打印内容
    if prompt := st.chat_input("Say something"):
        # 将用户输入添加到对话历史中
        st.session_state.messages.append(("human", prompt))
        # 显示当前用户输入
        with messages.chat_message("human"):
            st.write(prompt)
        # 生成回复
        answer = gen_response(
            chain=st.session_state.qa_history_chain,
            input=prompt,
            chat_history=st.session_state.messages
        )
        # 流式输出
        with messages.chat_message("ai"):
            output = st.write_stream(answer)
        # 将输出存入st.session_state.messages
        st.session_state.messages.append(("ai", output))

4.3.3 Deploy the application

Run locally:streamlit run "notebook/C4 构建 RAG 应用/streamlit_app.py" Remote Deployment: To deploy your application to Streamlit Cloud, follow these steps:

Create a GitHub repository for the application. Your repository should contain two files:

your-repository/
├── streamlit_app.py
└── requirements.txt
Go to Streamlit Community Cloud, click in the workspaceNew appbutton and specify the repository, branch, and master file path. Alternatively, you can customize your application's URL by selecting a custom subdomain
ClickDeploy!button

Your application will now be deployed to the Streamlit Community Cloud and accessible from anywhere in the world! 🌎

Our project deployment is basically completed at this point. It has been simplified for the convenience of demonstration. There are still many places that can be further optimized. We look forward to learners making various magic changes!

Optimization direction:

Added the function of uploading local documents and establishing vector database in the interface
Added buttons for multiple LLM and embedding method selections
Add button to modify parameters
More......

#Chapter 4 Building RAG Applications

#4.1 Connect LLM to LangChain

#4.1.1 Call ChatGPT based on LangChain

#4.1.2 Use LangChain to call Baidu Wenxinyiyan

#4.1.3 iFlytek Spark

#4.1.4 Use LangChain to call GLM

#4.2 Build a search question and answer chain

#4.2.1 Load vector database

#Principle 1: Write clear and specific instructions

#4.2.2 Create retrieval chain

#4.2.3 Create LLM

#4.2.4 Constructing a search question and answer chain

#4.2.5 Add chat records to the search chain

#4.2.6 Retrieval chain with information compression

#4.3 Deploy Knowledge Base Assistant

#4.3.1 Introduction to Streamlit

#4.3.2 Building the application

#4.3.3 Deploy the application