title: Interview Questions in Essence: Database description: In Python interviews, language features are a must. Understanding these underlying mechanisms will not only help you cope with interviews, but also avoid writing bugs that are difficult to troubleshoot in actual development.
Beginning: Database is the "stepping stone" for back-end interviews
Python back-end development - whether it is using Flask/django to write web services or crawler interfaces to connect storage - it is completely inseparable from the database. According to incomplete statistics, hard-core database interview questions can account for 30%-40% of back-end interviews.
This article has compiled 7 core knowledge points that are most commonly tested and most likely to be pitted when writing code, with a practical perspective and no dry definitions:
1. Database transaction (Transaction)
Must memorize for interviews + avoid pitfalls in practice
The core is to memorize the ACID four characteristics, but don’t just recite them, add a sentence: "It is safer to use ORM (such as django ORM, SQLAlchemy) to open transactions in Python, and avoid manual missed commits/rollbacks."
Dismantling of the four characteristics (human version)
2. Database index and B+ Tree
Soul torture: Why does MySQL InnoDB prefer B+ Tree?
Binary trees, B-Tree, and hash indexes will not work. The core reason revolves around "The database is stored on disk, and disk IO is a performance killer".
First step on the pitfalls of other indexes
- Binary search tree / BST: Extreme cases (such as storing 1-10000 in order) will degenerate into a linked list, and the query time complexity is O(n)
- Red-Black Tree: Although it is balanced, the height is too high (it takes about 14 layers to store 10,000 pieces of data) and the number of disk IOs is high.
- B-Tree: Internal nodes also store data, each node can hold few indexes, and the tree is not chunky enough; range queries require cross-layer traversal, which is slow.
- Hash index: only suitable for equivalent queries (
WHERE id=1), range, sorting, and fuzzy query are all covered
B+ Tree’s three killer moves (corresponding to MySQL high-frequency test sites)
- Disk IO cost is extremely low: All internal nodes are empty shell indexes. One disk page (InnoDB default 16KB) can hold hundreds or thousands of indexes. The tree is so short that it only has 3-5 layers. One query can require up to 5 disk IOs.
- Query efficiency is absolutely stable: All data is in leaf nodes, and the path length from the root to any leaf is exactly the same.
- Range/sort query beats everything: The leaf nodes are strung together with a two-way linked list. Just scan the linked list directly without jumping back to the root node to check again.
Clustered index vs non-clustered index (Python developers often confuse table backing!)
3. Redis core principles
Soul Torture Three Series: Why is single thread so fast? What data structure is used? How to persist without losing data?
Redis is the absolute leader of the Python backend caching three swordsmen (Redis / Memcached / local cache), and it is a must-take for interviews!
Why can a single thread defeat multi-threaded Memcached?
Stop saying "Redis is completely single-threaded" - Redis 6.0+ background tasks (persistence, cleaning up expired keys) are multi-threaded, but the core of command processing (memory operations + IO multiplexing) is still single-threaded, which is the reason why it is fast:
- Pure memory operation: Reading and writing are all done in memory, which is more than 100,000 times faster than disk IO
- IO multiplexing (epoll/kqueue): One thread can monitor thousands of client connections without opening a thread for each connection (avoiding context switching and lock competition)
- The core of command processing is lock-free: single-threaded operation of memory, no need to lock or unlock at all, so the efficiency soars
High-frequency data structures (don’t just talk about String!)
Persistence strategy (RDB vs AOF vs hybrid, the mainstream chooses hybrid!)
Pure memory operation has a fatal disadvantage: all data will be lost when power is turned off/restart, so it must be persisted to disk:
- RDB (snapshot): timing (such as
save 900 1) Compress all the data in the memory into binary files and save them to the disk
- ✅ Extremely fast recovery (suitable for backup/migration)
- ❌ It is easy to lose data (at most, all data within the save interval will be lost)
- AOF (Append Log): Append each write command ** to the text file ** line by line (similar to MySQL's Binlog)
- ✅ Extremely high data security (configuration
appendfsync alwaysAlmost never lost) - ❌ Files are getting larger and larger, and recovery is getting slower and slower
- Hybrid persistence (Redis 4.0+, Python backend is recommended to be enabled by default): RDB head (full amount) + AOF tail (incremental), which has the advantages of fast recovery of RDB and the advantages of AOF data security.
4. Pessimistic locking vs optimistic locking
Scenario is king: read more and write less and choose optimism, write more and read less and choose pessimism!
Python development often encounters the problem of "multiple people changing the same inventory at the same time", and it is necessary to lock it at this time.
Pessimistic lock (lock first, use later, stable but slow)
- ✅ There will never be oversold conditions
- ❌ The lock takes a long time and the concurrency performance is poor (only the person who gets the lock can operate it)
- 🎯 Applicable scenarios: ticket grabbing, flash sale (more writing, less reading, extremely fierce concurrency competition)
Optimistic locking (assuming no one grabs it, check again when submitting, fast but conflicting)
- ✅ No locking, good concurrency performance
- ❌ When the conflict rate is high, frequent retries are required, but it will be slower.
- 🎯 Applicable scenarios: modifying user information, likes (read more and write less, low probability of conflict)
5. MVCC (Multiple Version Concurrency Control)
InnoDB’s secret weapon: non-blocking read to resolve “read-write” conflicts!
If there are only locks, "read-write" operations will also block each other, and the concurrency performance will be very poor. MVCC is designed to solve this problem.
Principle of human version
InnoDB adds two implicit columns to each row of data (no need to manually create them):
- Creation Timestamp: Record the transaction ID that creates/modifies this row of data
- Delete Timestamp: Record the transaction ID of deleting this row of data (NULL if not deleted)
At the same time, use Undo Log (rollback log) to save the historical version linked list of the data.
Two kinds of read operations (Python development should pay attention!)
- Snapshot read (ordinary SELECT, used in 99% of cases): No locking, reading the historical version in the Undo Log, not blocking write operations, nor being blocked by write operations
- Current reading (INSERT/UPDATE/DELETE, or locked SELECT): What is being read is the latest version and needs to be locked.
6. Storage engine: MyISAM vs InnoDB
For the Python backend, choose InnoDB without any thought! unless…
After MySQL 5.5, the default is InnoDB, but MyISAM is occasionally seen in old projects. Just memorize the comparison table clearly.
7. Elasticsearch (ES) search depth analysis
When MySQL's fuzzy search is so slow that it explodes, try ES!
ES is a distributed full-text retrieval server based on Lucene. It is commonly used in Python development for e-commerce product search, log analysis, and real-time monitoring.
Core killer move: Inverted Index (Inverted Index)
MySQL fuzzy search (LIKE "%手机%") requires a full table scan. ES relies on inverted index to increase the search speed by several orders of magnitude:
For example, if you search for "Apple mobile phone", you can directly check the document ID list of "Apple" and "mobile phone" in the inverted index and find the intersection ([1]), which takes less than 1 second.
🚀 Follow-up suggestions for avoiding pitfalls
This chapter skips Redis and ES without going into too much detail. In the Python back-end interview, Redis's cache avalanche, penetration, and breakdown are "must-take questions among the must-take questions." I can write you a separate "Guide to avoid pitfalls of the three Redis brothers" with Python code examples. Do you want it?

