title: Practical project three: Semantic search and question answering system | Daoman PythonAI description: Based on vector retrieval and RAG architecture, an enterprise FAQ semantic search question and answer system is built to support natural language questions and accurate answer matching. This tutorial introduces in detail the complete implementation of core technologies such as semantic search, vector database, and RAG question and answer system. keywords: [Semantic search, question answering system, RAG, vector retrieval, Sentence-BERT, FAISS, vector database, knowledge base question answering, natural language processing, semantic understanding]
Practical project three: Semantic search and question answering system
Table of contents
Project Overview
In scenarios such as intelligent customer service and knowledge base question and answer, semantic search and question and answer systems are rapidly replacing traditional keyword matching. Through vector retrieval technology, the system can understand queries with "similar meanings but different words", significantly improving the hit rate and user experience.
This tutorial focuses on the lightweight implementation scenario of Enterprise FAQ Q&A, and takes you step by step to implement an out-of-the-box semantic Q&A system. The project goals are clear:
- ✅ Supports natural language Chinese questions
- ✅ The search accuracy rate reaches more than 85%
- ✅ Single query response time is less than 300 milliseconds
- ✅ No GPU required, ordinary CPU server can be deployed
Core architecture and technology stack
Streamlined architecture
The entire system consists of four layers with clear responsibilities:
- User interaction layer: Built with Streamlit, you can create a beautiful interface without front-end experience.
- API Gateway Layer: FastAPI provides high-performance, automatically documented RESTful interfaces.
- RAG engine layer: Search first and then generate, using the retrieved knowledge to assist in answering.
- Storage layer: FAISS-based vector index, carrying the semantic vectors of all FAQs.
Selection criteria and recommendations
This project adheres to the principle of free and open source, lightweight and easy to deploy. The component selection is as follows:
Select
IndexFlatIPrather thanIndexFlatL2The reason: after vector normalization, the inner product is equivalent to the cosine similarity and is more computationally efficient.
Data and vectorized indexing
1. Data preparation
In its simplest form, FAQ data are question-answer pairs. In order to improve the search effect, we add asearch_textfields, splicing together original questions, common similar questions, keywords and other information.
Example data structure:
Tips:
search_textThe quality directly affects the retrieval recall rate. It is recommended to regularly supplement high-frequency similar questions based on real user logs.
2. Vectorization and index construction
The following code completes model loading, batch encoding, normalization, and construction and persistence of the FAISS index.
After running, you will get two files locally:
faq_index.faiss: vector indexfaq_data.pkl:Original FAQ data
When retrieving later, just load these two files.
Semantic retrieval and RAG implementation
1. Semantic retrieval module
We encapsulate the retrieval logic into a class to facilitate subsequent calls.
thresholdIt is the similarity threshold. It is recommended to set it to 0.3 initially, and it can be adjusted later according to the actual effect. Results lower than this value mean that the semantic gap is too large and are discarded directly.
2. Lightweight RAG engine
After retrieving relevant content, how to generate the final answer? Two modes are provided here:
- Rule Mode (default): Directly return the most similar standard answers to FAQs, completely offline and at zero cost.
- LLM Enhanced Mode: Use the search results as context and call a large language model to generate more natural answers, which has better results, but relies on external APIs.
In a production environment, you can
Threshold、use_llmMake it a configurable item and dynamically switch modes.
Rapid deployment solution
1. Backend FastAPI core
The backend only retains core functions and omits logging, authentication, and caching codes, making it easy to get started quickly.
Start command:
accesshttp://localhost:8000/docsYou can see the automatically generated Swagger document and test it directly online.
2. Front-end Streamlit core
The front-end code is minimal but fully functional: enter a question, toggle LLM enhancements, view reference sources.
Start the frontend:
3. One-click start command
Open your browser and you can ask questions to the FAQ knowledge base using natural language in the Streamlit interface.
Summary of optimization points
- Vectorization Optimization
- Merge similar questions, categories, and tags into
search_textfields to enrich semantic information. - Prioritize batch encoding and avoid calling each item one by one.
- If you have a GPU, add it when initializing the model
device="cuda"Parameters, the encoding speed can be increased several times.
- Search Optimization
- When the amount of data is less than 100,000,
IndexFlatIPBoth simple and precise. - When the amount of data is larger, it can be replaced by
IndexIVFFlat, sacrificing a small amount of accuracy for faster speed. - Setting a reasonable similarity threshold (e.g. 0.3) can effectively filter irrelevant results.
- Introduce hybrid retrieval when necessary: use semantic similarity and simple exact matching of keywords to improve the recall rate of long-tail questions.
- Performance Optimization
- Local LRU caches embedding vectors and final results of high-frequency queries to avoid repeated calculations.
- When deploying multiple instances, use Redis as a centralized cache.
- For production environments, it is recommended to use Uvicorn’s multi-worker mode (for example
--workers 4) or paired with Gunicorn to improve concurrency.
From data processing, vectorization, index construction, to retrieval, question answering, and front-end and back-end deployment, this project completely covers the minimum viable product of a semantic question answering system. You can use it as the starting point for corporate FAQ, product consultation, online course Q&A and other scenarios, and then add functions such as intent recognition, multi-round dialogue, and automatic update of the knowledge base according to actual needs to gradually improve it.

