Chapter 2 Develop applications using LLM API
2.1 Basic concepts
2.1.1 Prompt
Prompt was originally a task-specific input template designed by NLP (natural language processing) researchers for downstream tasks, similar to a task (such as classification, clustering, etc.) that corresponds to a prompt. After ChatGPT was launched and gained a large number of applications, Prompt began to be promoted as all inputs to large models. That is, every time we access the large model, the input is a Prompt, and the result returned to us by the large model is called Completion.
For example, in the following example, the question we asked ChatGPT "What does Prompt in NLP mean?" is our question, which is actually our Prompt this time; and the return result of ChatGPT is this Completion. That is, for the ChatGPT model, the Completion corresponding to the Prompt is as shown in the figure below.

In the future, we will use Prompt to replace the input to LLM, and use Completion to replace the output of LLM. At the same time, we will combine specific cases to introduce how to design Prompt to give full play to the capabilities of LLM.
2.1.2 Temperature
LLM generation is random, and the final result is generated by selecting prediction results with different prediction probabilities at the top level of the model. We can generally control the randomness and creativity of LLM generated results by controlling the temperature parameter.
Temperature generally takes a value between 0 and 1. When the value is lower and closer to 0, the randomness of the prediction will be lower, producing more conservative and predictable text, and less likely to generate unexpected or unusual words. When values are higher, closer to 1, predictions are more random, all words are more likely to be selected, more creative and diverse text is produced, and unusual or unexpected words are more likely to be generated.
For example, when we use ChatGPT to ask it to think of a master's thesis topic for us about cross-language models, when we set temperature to 0, the model responds as:
Title: Research on machine translation performance optimization based on cross-language models
Abstract: With the development of globalization, the demand for cross-language communication is growing day by day. As an automated translation technology, machine translation facilitates cross-language communication. However, due to the differences between languages, the performance of machine translation systems varies between different language pairs. This paper aims to explore how to optimize the performance of machine translation and improve translation quality and accuracy by studying cross-language models.
Keywords: cross-language model, machine translation, performance optimization, translation quality, accuracy
-
Introduction 1.1 Research background and significance 1.2 Research status at home and abroad 1.3 Research objectives and content
-
Principles and methods of cross-language models 2.1 Traditional machine translation methods 2.2 Basic principles of cross-language models 2.3 Common methods and techniques for cross-language models
-
Application of cross-language models in machine translation 3.1 Advantages and challenges of cross-language models in machine translation 3.2 Comparison of the performance of cross-language models on different language pairs 3.3 Analysis of application cases of cross-language models in machine translation in specific fields
-
Performance optimization methods for cross-language models 4.1 Data preprocessing and feature selection 4.2 Model parameter tuning and training strategy 4.3 Combination application of cross-language models and other technologies
-
Experimental design and result analysis 5.1 Experimental design and data set introduction 5.2 Analysis and comparison of experimental results 5.3 Feasibility and reliability verification of experimental results
-
Conclusion and outlook 6.1 Summary of research work 6.2 Existing problems and shortcomings 6.3 Further research directions and prospects
References
Note: The above is only an example of topic selection. The specific topic selection should be adjusted and modified according to personal interests and research background.
When we set temperature to 0.95, the model replies:
The following are several master's thesis topic suggestions on cross-language models:
-
Comparison and optimization of cross-language text classification models - By comparing the performance of different cross-language models, analyzing their adaptability and accuracy between different languages, and trying to optimize existing models to improve the effect of cross-language text classification.
-
Machine translation performance analysis based on cross-language models - By comparing the performance of different cross-language models in machine translation tasks, we study the shortcomings and advantages between different languages, and explore how to improve the performance of cross-language machine translation in a targeted manner.
-
Construction and application of cross-language sentiment analysis models - use cross-language models to build sentiment analysis models, compare the accuracy and consistency of the models in different languages, analyze the differences in emotional expressions in different languages, and explore the application of sentiment analysis in multi-language scenarios.
-
Cross-language model training under multi-task learning - explore the training methods and effects of cross-language models under multi-task learning, improve the generalization ability and performance of cross-language models by training multiple tasks at the same time, and verify the impact of multi-task learning on cross-language models.
-
Construction and evaluation of cross-language question answering system - Construct a cross-language question answering system, realize automatic parsing of questions and generation of answers through cross-language models, and evaluate the effectiveness and usability of the system in multi-language environments.
-
Application of pre-trained cross-language models to speech recognition - apply pre-trained cross-language models to speech recognition tasks, study its impact on speech recognition accuracy and multi-lingual adaptability, and compare the processing capabilities of different models for multi-lingual speech data.
Choose a topic that suits your interests and research direction, and conduct in-depth discussions and confirmation with your instructor.
For different problems and application scenarios, we may need to set different temperatures. For example, in the personal knowledge base assistant project built in this tutorial, we generally set temperature to 0 to ensure the assistant's stable use of knowledge base content and avoid erroneous content and model illusions; in scenarios such as product intelligent customer service and scientific research paper writing, we also need stability more than creativity; but in scenarios such as personalized AI and creative marketing copywriting generation, we need creativity more, and are therefore more inclined to set temperature to a higher value.
2.1.3 System Prompt
System Prompt is an emerging concept that has gradually been widely used as the ChatGPT API is opened. In fact, it is not reflected in the training of the large model itself, but is a strategy set by the large model server to improve user experience.
Specifically, when using the ChatGPT API, you can set two types of prompts: one is System Prompt, which will permanently affect the model's response throughout the session and is of higher importance than ordinary prompts; the other is User Prompt, which is more biased towards the prompts we usually mention, that is, inputs that require the model to respond.
We generally set up System Prompt to make some initial settings for the model. For example, we can set the personality we want it to have in System Prompt, such as a personal knowledge base assistant, etc. There is generally only one System Prompt in a session. After setting the model's personality or initial settings through System Prompt, we can give the instructions that the model needs to follow through User Prompt. For example, when we need a humorous personal knowledge base assistant and ask this assistant what I have to do today, we can construct the following prompt:
Through the above Prompt structure, we can let the model answer the user's questions in a humorous style.
2.2 Using LLM API
This chapter mainly introduces the API application guidelines for four major language models (ChatGPT, Wenxinyiyan, iFlytek Spark, and GLM) and the native API calling method of the Python version. Readers can choose an API that they can apply for according to the actual situation to read and learn.
- ChatGPT: Recommended for readers who can surf the Internet scientifically;
- A word from Wenxin: There is currently no activity to give away tokens to new users, and it is recommended for users who already have Wenxin tokens and paying users;
- iFlytek Spark: New users will be given tokens, and free users are recommended;
- GLM: New users will be given tokens as a gift, and free users are recommended to use it.
If you need to use LLM in LangChain, you can refer to the calling method in LLM 接入 LangChain.
2.2.1 Using ChatGPT
ChatGPT, released in November 2022, is a representative product of the Large Language Model (LLM) that is currently popular in the industry. At the end of 2022, it was ChatGPT’s amazing performance that triggered the LLM craze. To date, GPT-4 released by OpenAI still represents the upper limit of LLM performance, and ChatGPT is still the LLM product with the largest number of users, the most popular use, and the most development potential. In fact, in the eyes of outsiders, ChatGPT is the name of LLM.
In addition to releasing free web-side products, OpenAI also provides a variety of ChatGPT APIs, allowing developers to call ChatGPT through Python or Request requests and embed the powerful capabilities of LLM into their own services. The main models available include ChatGPT-3.5 and GPT-4, and each model also has multiple context versions. For example, ChatGPT-3.5 has the original 4K context length model and the 16K context length model gpt-turbo-16k-0613.
API Application Guidelines
The OpenAI API calling service is paid, and each developer needs to first obtain and configure the OpenAI API key before they can access ChatGPT in the application they build. In this section we will briefly describe how to obtain and configure the OpenAI API key.
Before obtaining the OpenAI API key, we need to register an account at OpenAI 官网. It is assumed here that we already have an OpenAI account and log in at OpenAI 官网. After logging in, as shown below:
we chooseAPI, then click in the left sidebarAPI keys, as shown in the figure below:
ClickCreate new secret keybutton to create an OpenAI API key. We will copy the created OpenAI API key in this form.OPENAI_API_KEY="sk-..."save to.envfile and place.envThe file is saved in the project root directory.
The following is read.envFile code:
Call OpenAI API
To call ChatGPT, you need to use ChatCompletion API. This API provides calls to the ChatGPT series models, including ChatGPT-3.5, GPT-4, etc.
The ChatCompletion API calling method is as follows:
Calling this API will return a ChatCompletion object, which includes attributes such as answer text, creation time, and id. What we generally need is the answer text, which is the content information in the answer object.
Here we introduce in detail several parameters commonly used when calling API:
· model, that is, the called model, the general values include "gpt-3.5-turbo" (ChatGPT-3.5), "gpt-3.5-turbo-16k-0613" (ChatGPT-3.5 16K version), "gpt-4" (ChatGPT-4), "gpt-4o" (ChatGPT-4o). Note that the cost of different models is different.
· messages, our prompt. The messages of ChatCompletion need to be passed in a list, which includes prompts of multiple different roles. The roles we can choose generally include system: the system prompt mentioned above; user: the prompt entered by the user; assistant: assistant, usually the model's historical reply, as a reference content provided to the model.
· temperature, temperature. That is the Temperature coefficient mentioned above.
· max_tokens, the maximum number of tokens, that is, the maximum number of tokens output by the model. OpenAI calculates the number of tokens by combining the total number of Prompt and Completion tokens. It is required that the total number of tokens cannot exceed the model upper limit (for example, the default model token upper limit is 4096). Therefore, if the input prompt is long, you need to set a larger max_token value, otherwise an error exceeding the limit length will be reported.
OpenAI provides sufficient customization space, allowing us to improve the model answer effect by customizing the prompt. The following is a simple function that encapsulates the OpenAI interface, allowing us to directly pass in the prompt and obtain the output of the model:
'Hello! Is there anything I can do to help you? '
In the above function, we encapsulate the details of messages and only use user prompt to implement the call. In simple scenarios, this function is sufficient for usage.
2.2.2 Use Wen Xin Yi Yan
文心一言, a Chinese large model launched by Baidu on March 27, 2023, is currently a representative product of domestic large language models. Limited by differences in the quality of Chinese corpus and bottlenecks in domestic computing resources and computing technology, Wenxinyiyan still has a certain gap from ChatGPT in overall performance, but it has demonstrated superior performance in the Chinese context. The implementation scenarios that Wen Xinyiyan is considering include multi-modal generation, literary creation and other business scenarios. The goal is to catch up with ChatGPT in the Chinese context. Of course, Baidu still has a long way to go to truly defeat ChatGPT; but in China, where generative AI supervision is relatively strict, as the first batch of generative AI applications allowed to be open to the public, Wen Xinyiyan still has certain commercial advantages over ChatGPT, which cannot be used publicly.
Baidu also provides Wen Xinyiyan’s API interface. While launching large models, it also launched文心千帆The enterprise-level large language model service platform includes Baidu's entire large language model development work chain. For small and medium-sized enterprises or traditional enterprises that do not have the ability to actually implement large models, considering Wenxin Qianfan is a feasible option. Of course, this tutorial only covers calling the Wenxin Yiyan API through the Wenxin Qianfan platform, and will not discuss other enterprise-level services.
GET KEY
Baidu Smart Cloud Qianfan large model platform provides 千帆 SDK in multiple languages. Developers can use the SDK to quickly develop functions and improve development efficiency.
Before using Qianfan SDK, you need to obtain the Wenxin Yiyan calling key. You need to configure your own key in the code to call the model. Let's take Python SDK as an example to introduce the process of calling Wenxin model through Qianfan SDK.
First, you need to have a Baidu account that has been authenticated by real-name. Each account can create several applications, and each application will correspond to an API_Key and Secret_Key.

Enter 文心千帆服务平台 and click the above应用接入button to create an application that calls the Wenxin large model.

Then click去创建button to enter the application creation interface:

Simply enter basic information, select the default configuration, and create an application.

After the creation is completed, we can see the created application in the consoleAPI Key、Secret Key。
**It should be noted that Qianfan currently only has three services, Prompt模板, Yi-34B-Chat and Fuyu-8B公有云在线调用体验服务, which are free to call. If you want to experience other model services, you need to activate the paid service of the corresponding model at 计费管理 to experience it. **
What we get hereAPI Key、Secret KeyFill in to.envDocumentaryQIANFAN_AKandQIANFAN_SKparameter. If you are using security authentication parameter verification, you need to check it on the 百度智能云控制台-用户账户-安全认证 page.Access Key、Secret Key, and fill in the obtained parameters accordingly.envDocumentaryQIANFAN_ACCESS_KEY、QIANFAN_SECRET_KEY。

Then execute the following code to load the key into the environment variable.
Calling Wenxin Qianfan API
Baidu Wenxin also supports configuring the prompts of the user and assistant member roles in the messages field of the incoming parameter. However, unlike OpenAI's prompt format, the model personality is passed in through another parameter, the system field, rather than in the messages field.
Next we use SDK to encapsulate aget_completionfunction for subsequent use.
Remind readers again: If there is no free or purchased credit in the account, execute the following code to call WenxinERNIE-Bot, the following error will be reported:error code: 17, err msg: Open api daily request limit reached。
Click 模型服务 to view a list of all models supported by Qianfan.
If you are a free user, when using the above function, you can specify a free model in the input parameters (for exampleYi-34B-Chat) and then run:
'Hello! My name is Yi, and I am an intelligent assistant developed by Zero One Wan. I was trained by Zero One Wan's research team through a large amount of text data, and learned various patterns and associations of language, so that I can generate text, answer questions, and conduct conversations. My goal is to help users obtain information, answer questions, and provide various text-related help. I am an artificial intelligence with no feelings or consciousness, but I can imitate human communication and provide useful information based on what I learned during training. If you have any questions or need help, please feel free to let me know! '
If you have Wenxin series modelsERNIE-Botusage quota, you can directly run the following function:
'Hi! I'm Little Whale, your personal assistant. I'm here to help you with questions, provide information and advice to make your life easier and more enjoyable! '
Baidu Qianfan provides a variety of model interfaces for calling. Among them, the one we used aboveERNIE-BotThe model's dialogue chat interface is often referred to as the Baidu Wenxin large model. Here is a brief introduction to the common parameters of the Wenxin large model interface:
· messages, which is the calling prompt. Wenxin's messages configuration is somewhat different from ChatGPT. It does not support the max_token parameter. The maximum number of tokens is controlled by the model. The total length of the content, functions, and system fields in messages cannot exceed 20480 characters, and cannot exceed 5120 tokens. Otherwise, the model will forget the previous messages one by one. Wenxin's messages have the following requirements: ① One member is a single-round conversation, and multiple members are multi-round conversations; ② The last message is the current conversation, and the previous messages are historical conversations; ③ The number of members must be an odd number, and the roles in the message must be user and assistant in order. Note: What is introduced here is the character number and tokens limit of the ERNIE-Bot model. The parameter limits vary from model to model. Please check the parameter description of the corresponding model on the Wenxin Qianfan official website.
· stream, whether to use streaming.
· Temperature, temperature coefficient, defaults to 0.8. Wenxin’s temperature parameter requires a range of (0, 1.0] and cannot be set to 0.
2.2.3 Using iFlytek Spark
iFlytek Spark Cognitive Large Model, a Chinese large model launched by iFlytek in May 2023, is also one of the representative products of domestic large models. Similarly, limited by the Chinese context and computing resources, there are still differences between Spark and ChatGPT in terms of user experience. However, as a large domestic Chinese model that is on par with Wenxin, it is still worth looking forward to and trying. Compared with Baidu, which has significant resources and technological advantages, if iFlytek wants to break out of the tight encirclement and become the leader in domestic large models, it needs to make full use of its relative advantages. At least for now, Spark has not fallen behind.
API Application Guidelines
The iFlytek Spark platform provides free quota for Spark3.5 Max, Spark4.0 Ultra and other models. We can receive free tokens quota on the platform, click免费领取:


After receiving the free trial package, click to enter the console and create the application. After the creation is completed, you can see the results we obtained.APPID、APISecretandAPIKeyGot:

Call via SDK
First execute the following code to load the key into the environment variable.
Then we use the SDK to encapsulate aget_completionfunction for subsequent use.
'Hello! Nice to meet you here. If you have any questions or need help, you can ask me at any time and I will try my best to answer you. '
2.2.4 Using GLM
Zhipu AI is a company transformed from the technical achievements of the Computer Science Department of Tsinghua University. It is committed to creating a new generation of cognitive intelligence general models. The company jointly developed the bilingual 100-billion-level ultra-large-scale pre-training model GLM-130B, and built a high-precision universal knowledge graph to form a cognitive engine driven by two wheels of data and knowledge. ChatGLM (chatglm.cn) was built based on this model.
ChatGLM series models, including ChatGLM-130B, ChatGLM-6B and ChatGLM2-6B (upgraded version of ChatGLM-6B) models, support relatively complex natural language instructions and can solve difficult reasoning problems. Among them, the ChatGLM-6B model has been downloaded from Huggingface for more than 3 million times (statistics as of June 24, 2023). The model has ranked first in the Hugging Face (HF) global large model download list for 12 consecutive days, and has had a great impact on the open source community at home and abroad.
API Application Guidelines
First enter 智谱AI开放平台, enter your mobile phone number and verification code to register:

Newly registered users can receive a free experience package of 20 million tokens. After personal real-name authentication, more tokens will be given away. Zhipu AI provides 体验入口 of two different models, GLM-4-Plus and GLM-4-Flash, which can be clicked立即体验Button experience directly.

If you need to use an API key to build an application, you need to click the key-shaped button in the upper right corner of the console to enter our personal API management list. In this interface, you can see the application name and application name corresponding to the API we obtained.API key.

we can click添加新的 API keyAnd enter the corresponding name to generate a new API key.
Calling GLM API
Zhipu AI provides SDK and native HTTP to implement model API calls. It is recommended to use SDK to get a better programming experience.
First we need to configure the key information and add the previously obtainedAPI keyset to.envin the fileZHIPUAI_API_KEYparameters, and then run the following code to load the configuration information.
The calling parameters of Zhipu are similar to others. You also need to pass in a messages list, which includes role and prompt. We package it as followsget_completionfunction for subsequent use.
'Hello 👋! I am the artificial intelligence assistant Zhipu Qingyan (ChatGLM). Nice to meet you. You are welcome to ask me any questions. '
Here is a brief introduction to the parameters passed into zhipuai:
-
messages (list), when calling the dialogue model, input the current dialogue information list as a prompt to the model; pass parameters in the form of key-value pairs {"role": "user", "content": "Hello"}; when the total length exceeds the maximum input limit of the model, it will be automatically truncated and needs to be sorted from oldest to newest. -
temperature (float), sampling temperature, controls the randomness of the output, must be a positive number, the value range is: (0.0, 1.0), cannot be equal to 0, the default value is 0.95. A larger value will make the output more random and creative; a smaller value will make the output more stable or certain. -
top_p (float), another method of temperature sampling is called nuclear sampling. The value range is: (0.0, 1.0) open interval, cannot be equal to 0 or 1, the default value is 0.7. The model considers outcomes with top_p probability mass tokens. For example: 0.1 means that the model decoder only considers tokens from the candidate set with a probability of 10% -
request_id (string), the parameter is passed by the user and must ensure uniqueness; it is used to distinguish the unique identifier of each request. If the user does not pass it, the platform will generate it by default. -
It is recommended that you adjust the top_p or temperature parameters according to the application scenario, but do not adjust both parameters at the same time
2.3 Prompt Engineering
In the LLM era, the word prompt is familiar to every user and developer. So what exactly is prompt? To put it simply, prompt is the name for the user's input when interacting with the large model. That is, the input we give to the large model is called Prompt, and the output returned by the large model is generally called Completion.

For a large language model (LLM) that has strong natural language understanding and generation capabilities and can handle diverse tasks, a good Prompt design greatly determines the upper and lower limits of its capabilities. How to use Prompt to give full play to the performance of LLM? First of all, we need to know the principles of designing Prompt. They are the basic concepts that every developer must know when designing Prompt. This section discusses two key principles for designing efficient prompts: write clear, specific instructions and give the model enough time to think. Mastering these two points is particularly important for creating reliable language model interactions.
Prompt needs to clearly express the requirements and provide sufficient context so that the language model can accurately understand our intentions. This does not mean that prompts must be very short and concise. Too brief prompts often make it difficult for the model to grasp the specific tasks to be completed, while longer and more complex prompts can provide richer context and details, allowing the model to more accurately grasp the required operations and response methods, and give more expected responses.
So, remember to express your prompt in clear, detailed language, “Adding more context helps the model understand you better.”。
Starting from this principle, we provide several tips for designing prompts.
2.3.1 Use delimiters to clearly represent different parts of the input
When writing prompts, we can use various punctuation marks as "separators" to distinguish different parts of text. Separators are like walls in Prompt, separating different instructions, contexts, and inputs to avoid accidental confusion. You can choose to use```,""",< >,
在以下的例子中,我们给出一段话并要求 LLM 进行总结,在该示例中我们使用 ```as separator:
- First, let us call OpenAI’s API, encapsulate a conversation function, and use the gpt-3.5-turbo model.
Note: If you are using other model APIs, please refer to [Section 2] to modify the followingget_completionfunction.
- Use delimiters
3. 不使用分隔符
⚠️使用分隔符尤其需要注意的是要防止
提示词注入(Prompt Rejection)。什么是提示词注入?就是用户输入的文本可能包含与你的预设 Prompt 相冲突的内容,如果不加分隔,这些输入就可能“注入”并操纵语言模型,轻则导致模型产生毫无关联的不正确的输出,严重的话可能造成应用的安全风险。 接下来让我用一个例子来说明到底什么是提示词注入:
寻求结构化的输出
有时候我们需要语言模型给我们一些结构化的输出,而不仅仅是连续的文本。什么是结构化输出呢?就是按照某种格式组织的内容,例如 JSON、HTML 等。这种输出非常适合在代码中进一步解析和处理,例如,您可以在 Python 中将其读入字典或列表中。
在以下示例中,我们要求 LLM 生成三本书的标题、作者和类别,并要求 LLM 以 JSON 的格式返回给我们,为便于解析,我们指定了 JSON 的键名。
"title": "The Other Side of the Galaxy", "author": "李明宇", "genre": "science fiction" }, { "book_id": "002", "title": "Mystery of the Ancient City", "author": "王晓峰", "genre": "suspense" }, { "book_id": "003", "title": "Spiritual Journey", "author": "Chen Jing", "genre": "psychology" } ] ```
要求模型检查是否满足条件
如果任务包含不一定能满足的假设(条件),我们可以告诉模型先检查这些假设,如果不满足,则会指 出并停止执行后续的完整流程。您还可以考虑可能出现的边缘情况及模型的应对,以避免意外的结果或 错误发生。
在如下示例中,我们将分别给模型两段文本,分别是制作茶的步骤以及一段没有明确步骤的文本。我们 将要求模型判断其是否包含一系列指令,如果包含则按照给定格式重新编写指令,不包含则回答“未提供 步骤”。
上述示例中,模型可以很好地识别一系列的指令并进行输出。在接下来一个示例中,我们将提供给模型 没有预期指令的输入,模型将判断未提供步骤。
提供少量示例
"Few-shot" prompting(少样本提示),即在要求模型执行实际任务之前,给模型提供一两个参考样例,让模型了解我们的要求和期望的输出样式。
例如,在以下的样例中,我们先给了一个 {<学生>:<圣贤>} 对话样例,然后要求模型用同样的隐喻风格回答关于“孝顺”的问题,可以看到 LLM 回答的风格和示例里<圣贤>的文言文式回复风格是十分一致的。这就是一个 Few-shot 学习示例,能够帮助模型快速学到我们要的语气和风格。
利用少样本样例,我们可以轻松“预热”语言模型,让它为新的任务做好准备。这是一个让模型快速上手新 任务的有效策略。
2.3.2 原则二:给模型时间去思考
在设计 Prompt 时,给予语言模型充足的推理时间非常重要。语言模型与人类一样,需要时间来思考并解决复杂问题。如果让语言模型匆忙给出结论,其结果很可能不准确。例如,若要语言模型推断一本书的主题,仅提供简单的书名和一句简介是不足够的。这就像让一个人在极短时间内解决困难的数学题,错误在所难免。
相反,我们应通过 Prompt 引导语言模型进行深入思考。可以要求其先列出对问题的各种看法,说明推理依据,然后再得出最终结论。在 Prompt 中添加逐步推理的要求,能让语言模型投入更多时间逻辑思维,输出结果也将更可靠准确。
综上所述,给予语言模型充足的推理时间,是 Prompt Engineering 中一个非常重要的设计原则。这将大大提高语言模型处理复杂问题的效果,也是构建高质量 Prompt 的关键之处。开发者应注意给模型留出思考空间,以发挥语言模型的最大潜力。
从该原则出发,我们也提供几个设计 Prompt 的技巧:
指定完成任务所需的步骤
接下来我们将通过给定一个复杂任务,给出完成该任务的一系列步骤,来展示这一策略的效果。
首先我们描述了杰克和吉尔的故事,并给出提示词执行以下操作:
- 首先,用一句话概括三个反引号限定的文本。
- 第二,将摘要翻译成英语。
- 第三,在英语摘要中列出每个名称。
- 第四,输出包含以下键的 JSON 对象:英语摘要和人名个数。要求输出以换行符分隔。
指导模型在下结论之前找出一个自己的解法
在设计 Prompt 时,我们还可以通过明确指导语言模型进行自主思考,来获得更好的效果。 举个例子,假设我们要语言模型判断一个数学问题的解答是否正确。仅仅提供问题和解答是不够的,语 言模型可能会匆忙做出错误判断。
相反,我们可以在 Prompt 中先要求语言模型自己尝试解决这个问题,思考出自己的解法,然后再与提 供的解答进行对比,判断正确性。这种先让语言模型自主思考的方式,能帮助它更深入理解问题,做出 更准确的判断。
接下来我们会给出一个问题和一份来自学生的解答,要求模型判断解答是否正确:
但是注意,学生的解决方案实际上是错误的。(维护费用项100x应为10x,总费用450x应为360x)。我们可以通过指导模型先自行找出一个解法来解决这个问题。
在接下来这个 Prompt 中,我们要求模型先自行解决这个问题,再根据自己的解法与学生的解法进行对比,从而判断学生的解法是否正确。同时,我们给定了输出的格式要求。通过拆分任务、明确步骤,让 模型有更多时间思考,有时可以获得更准确的结果。
⚠️ 在开发与应用语言模型时,需要注意它们可能生成虚假信息的风险。尽管模型经过大规模预训练,掌握 了丰富知识,但它实际上并没有完全记住所见的信息,难以准确判断自己的知识边界,可能做出错误推断。若让语言模型描述一个不存在的产品,它可能会自行构造出似是而非的细节。这被称为
“幻觉” (Hallucination),是语言模型的一大缺陷。
如下示例展示了大模型的幻觉。我们要求给我们一些研究LLM长度外推的论文,包括论文标题、主要内容和链接:
The following are some research papers on large language model (LLM) length extrapolation, including the title of the paper and a brief introduction to the main content:
Main content: This paper explores how the performance of language models changes as model size, dataset size, and computational effort increase. The study found that the performance of the model follows a certain power law relationship, which provides a theoretical basis for understanding and predicting the behavior of larger-scale models. Link: arXiv
Main Content: This paper introduces GPT-3, a language model with 175 billion parameters, and demonstrates its powerful capabilities in few-shot learning tasks. The study also touches on the impact of model size on performance, providing insights into model scaling. Link: arXiv
Main content: This paper investigates optimal strategies for training large-scale language models under a given computational budget. The authors propose a new computational budget allocation method to improve model training efficiency and performance. Link: arXiv
Main content: The paper analyzes the performance of pre-trained language models at different scales, studies the relationship between model size, data volume and computing resources, and provides empirical results to support theoretical inferences. Link: arXiv
These papers provide theoretical and empirical research on scaling large language models, helping to understand how to effectively scale models to improve performance. Please note that accessing these links may require scientific Internet tools.
The paper information given by the model looks very correct, but if you open the link, you will find that some links display 404 or point to the wrong paper. In other words, the information or links in the paper are fabricated by the model.
The hallucination problem of language models is related to the reliability and security of applications. Developers must be aware of this flaw and take measures such as Prompt optimization and external knowledge to mitigate it in order to develop more reliable language model applications. This will also be one of the important directions for future language model evolution.

