Appendix 1 Custom LLM based on LangChain

LangChain provides an efficient development framework for developing custom applications based on LLM, allowing developers to quickly activate the powerful capabilities of LLM and build LLM applications. LangChain also supports a variety of large models and has built-in calling interfaces for large models such as OpenAI and LLAMA. However, LangChain does not have all large models built-in. It provides strong scalability by allowing users to customize LLM types.

In this section, we take Zhipu as an example to describe how to customize LLM based on LangChain so that the applications we build based on LangChain can support domestic platforms.

This part involves relatively more technical details of LangChain and large model calls. If you have the energy, you can learn to deploy it. If you don’t have the energy, you can directly use the subsequent code to support the calls.

To implement a custom LLM, you need to define a custom class that inherits from LangChain's LLM base class, and then define two functions: ① _generate method, receives a series of messages and other parameters, and returns output; ② _stream method, receives a series of messages and other parameters, and returns the results in a streaming format.

First we import the required third-party libraries:

from typing import Any, Dict, Iterator, List, Optional
from zhipuai import ZhipuAI
from langchain_core.callbacks import (
    CallbackManagerForLLMRun,
)
from langchain_core.language_models import BaseChatModel
from langchain_core.messages import (
    AIMessage,
    AIMessageChunk,
    BaseMessage,
    SystemMessage,
    ChatMessage,
    HumanMessage
)
from langchain_core.messages.ai import UsageMetadata
from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult
import time

Since the message type of LangChain isHumanMessage、AIMessageformat is different from the dictionary format received by general models, so we need to first define a function that converts LangChain's format into a dictionary.

def _convert_message_to_dict(message: BaseMessage) -> dict:
    """把LangChain的消息格式转为智谱支持的格式
    Args:
        message: The LangChain message.
    Returns:
        The dictionary.
    """
    message_dict: Dict[str, Any] = {"content": message.content}
    if (name := message.name or message.additional_kwargs.get("name")) is not None:
        message_dict["name"] = name

    # populate role and additional message data
    if isinstance(message, ChatMessage):
        message_dict["role"] = message.role
    elif isinstance(message, HumanMessage):
        message_dict["role"] = "user"
    elif isinstance(message, AIMessage):
        message_dict["role"] = "assistant"
    elif isinstance(message, SystemMessage):
        message_dict["role"] = "system"
    else:
        raise TypeError(f"Got unknown type {message}")
    return message_dict

Next we define a custom LLM class that inherits from the LLM class:

# 继承自 LangChain 的 BaseChatModel 类
class ZhipuaiLLM(BaseChatModel):
    """自定义Zhipuai聊天模型。
    """
    model_name: str = None
    temperature: Optional[float] = None
    max_tokens: Optional[int] = None
    timeout: Optional[int] = None
    stop: Optional[List[str]] = None
    max_retries: int = 3
    api_key: str | None = None

The above initialization covers the parameters we usually use, and more parameters can be added according to actual needs and the API of Zhipu.

Next we implement the _generate method:

    def _generate(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> ChatResult:
        """通过调用智谱API从而响应输入。

        Args:
            messages: 由messages列表组成的prompt
            stop: 在模型生成的回答中有该字符串列表中的元素则停止响应
            run_manager: 一个为LLM提供回调的运行管理器
        """
        # 列表推导式 将 messages 的元素逐个转为智谱的格式
        messages = [_convert_message_to_dict(message) for message in messages]
        # 定义推理的开始时间
        start_time = time.time()
        # 调用 ZhipuAI 对处理消息
        response = ZhipuAI(api_key=self.api_key).chat.completions.create(
            model=self.model_name,
            temperature=self.temperature,
            max_tokens=self.max_tokens,
            timeout=self.timeout,
            stop=stop,
            messages=messages
        )
        # 计算运行时间 由现在时间 time.time() 减去 开始时间start_time得到
        time_in_seconds = time.time() - start_time
        # 将返回的消息封装并返回
        message = AIMessage(
            content=response.choices[0].message.content, # 响应的结果
            additional_kwargs={}, # 额外信息
            response_metadata={
                "time_in_seconds": round(time_in_seconds, 3), # 响应源数据 这里是运行时间 也可以添加其他信息
            },
            # 本次推理消耗的token
            usage_metadata={
                "input_tokens": response.usage.prompt_tokens, # 输入token
                "output_tokens": response.usage.completion_tokens, # 输出token
                "total_tokens": response.usage.total_tokens, # 全部token
            },
        )
        generation = ChatGeneration(message=message)
        return ChatResult(generations=[generation])

Next we implement another core method _stream. The previously commented code will no longer be commented this time:

    def _stream(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> Iterator[ChatGenerationChunk]:
        """通过调用智谱API返回流式输出。

        Args:
            messages: 由messages列表组成的prompt
            stop: 在模型生成的回答中有该字符串列表中的元素则停止响应
            run_manager: 一个为LLM提供回调的运行管理器
        """
        messages = [_convert_message_to_dict(message) for message in messages]
        response = ZhipuAI().chat.completions.create(
            model=self.model_name,
            stream=True, # 将stream 设置为 True 返回的是迭代器，可以通过for循环取值
            temperature=self.temperature,
            max_tokens=self.max_tokens,
            timeout=self.timeout,
            stop=stop,
            messages=messages
        )
        start_time = time.time()
        # 使用for循环存取结果
        for res in response:
            if res.usage: # 如果 res.usage 存在则存储token使用情况
                usage_metadata = UsageMetadata(
                    {
                        "input_tokens": res.usage.prompt_tokens,
                        "output_tokens": res.usage.completion_tokens,
                        "total_tokens": res.usage.total_tokens,
                    }
                )
            # 封装每次返回的chunk
            chunk = ChatGenerationChunk(
                message=AIMessageChunk(content=res.choices[0].delta.content)
            )

            if run_manager:
                # This is optional in newer versions of LangChain
                # The on_llm_new_token will be called automatically
                run_manager.on_llm_new_token(res.choices[0].delta.content, chunk=chunk)
            # 使用yield返回 结果是一个生成器 同样可以使用for循环调用
            yield chunk
        time_in_sec = time.time() - start_time
        # Let's add some other information (e.g., response metadata)
        # 最终返回运行时间
        chunk = ChatGenerationChunk(
            message=AIMessageChunk(content="", response_metadata={"time_in_sec": round(time_in_sec, 3)}, usage_metadata=usage_metadata)
        )
        if run_manager:
            # This is optional in newer versions of LangChain
            # The on_llm_new_token will be called automatically
            run_manager.on_llm_new_token("", chunk=chunk)
        yield chunk

Then we also need to define the description method of the model:

    @property
    def _llm_type(self) -> str:
        """获取此聊天模型使用的语言模型类型。"""
        return self.model_name

    @property
    def _identifying_params(self) -> Dict[str, Any]:
        """返回一个标识参数的字典。

        该信息由LangChain回调系统使用，用于跟踪目的，使监视llm成为可能。
        """
        return {
            "model_name": self.model_name,
        }

Through the above steps, we can define the calling method of wisdom spectrum based on LangChain. We encapsulate this code in the zhipu_llm.py file.

#Appendix 1 Custom LLM based on LangChain

Appendix 1 Custom LLM based on LangChain