title: Interview Questions in Essence: Database description: In Python interviews, language features are a must. Understanding these underlying mechanisms will not only help you cope with interviews, but also avoid writing bugs that are difficult to troubleshoot in actual development.

Beginning: Database is the "stepping stone" for back-end interviews

Python back-end development - whether it is using Flask/django to write web services or crawler interfaces to connect storage - it is completely inseparable from the database. According to incomplete statistics, hard-core database interview questions can account for 30%-40% of back-end interviews.

This article has compiled 7 core knowledge points that are most commonly tested and most likely to be pitted when writing code, with a practical perspective and no dry definitions:


1. Database transaction (Transaction)

Must memorize for interviews + avoid pitfalls in practice

The core is to memorize the ACID four characteristics, but don’t just recite them, add a sentence: "It is safer to use ORM (such as django ORM, SQLAlchemy) to open transactions in Python, and avoid manual missed commits/rollbacks."

Dismantling of the four characteristics (human version)

CharacteristicsExplanationPractical examples
A (Atomicity) atomicityA transaction is an "indivisible package". Either all the SQL in it is executed, or it is as if it never happened. RollbackTransfer money with a bank card: A minus 100→B plus 100. If any step fails (such as B card abnormality), A's money must be returned to the account
C (Consistency) consistencyBefore and after the transaction is executed, the data logic must be reasonable (not the technical state, but the business/constraint state)Or transfer: the total balance of A and B must remain unchanged
Isolation
D (Durability) PersistenceOnce the transaction is explicitly submitted (not temporarily saved), the data will be permanently stored on the disk and will not be lost even if the database is restarted or the power is turned offSuccessful submission = the money is actually transferred, and the result will not be affected even if the banking system collapses

2. Database index and B+ Tree

Soul torture: Why does MySQL InnoDB prefer B+ Tree?

Binary trees, B-Tree, and hash indexes will not work. The core reason revolves around "The database is stored on disk, and disk IO is a performance killer".

First step on the pitfalls of other indexes

  1. Binary search tree / BST: Extreme cases (such as storing 1-10000 in order) will degenerate into a linked list, and the query time complexity is O(n)
  2. Red-Black Tree: Although it is balanced, the height is too high (it takes about 14 layers to store 10,000 pieces of data) and the number of disk IOs is high.
  3. B-Tree: Internal nodes also store data, each node can hold few indexes, and the tree is not chunky enough; range queries require cross-layer traversal, which is slow.
  4. Hash index: only suitable for equivalent queries (WHERE id=1), range, sorting, and fuzzy query are all covered

B+ Tree’s three killer moves (corresponding to MySQL high-frequency test sites)

# 简化版 B+ Tree 结构(3阶)
          [ 5, 9 ]          → 内部节点(只存索引,不存数据!)
        /   |    \
   [1,3] → [5,7,9] → [10,12]  → 叶子节点(有序存全量索引 + 行数据指针/主键值)
                             ↖双向链表↗
  1. Disk IO cost is extremely low: All internal nodes are empty shell indexes. One disk page (InnoDB default 16KB) can hold hundreds or thousands of indexes. The tree is so short that it only has 3-5 layers. One query can require up to 5 disk IOs.
  2. Query efficiency is absolutely stable: All data is in leaf nodes, and the path length from the root to any leaf is exactly the same.
  3. Range/sort query beats everything: The leaf nodes are strung together with a two-way linked list. Just scan the linked list directly without jumping back to the root node to check again.

Clustered index vs non-clustered index (Python developers often confuse table backing!)

# 伪代码模拟 SQLAlchemy 定义
class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True)  # 自动生成聚集索引(InnoDB 必须有)
    name = Column(String(20), index=True)   # 普通非聚集索引
    age = Column(Integer)

# 1. 走聚集索引(主键查询,最快)
# SELECT * FROM users WHERE id=1;
# → 直接在聚集索引的叶子节点找到行数据,1步到位

# 2. 走非聚集索引(回表查询,稍慢)
# SELECT * FROM users WHERE name="张三";
# → 先在 name 索引的叶子节点找到主键值 1
# → 再拿着主键值 1 去聚集索引查行数据,2步到位

3. Redis core principles

Soul Torture Three Series: Why is single thread so fast? What data structure is used? How to persist without losing data?

Redis is the absolute leader of the Python backend caching three swordsmen (Redis / Memcached / local cache), and it is a must-take for interviews!

Why can a single thread defeat multi-threaded Memcached?

Stop saying "Redis is completely single-threaded" - Redis 6.0+ background tasks (persistence, cleaning up expired keys) are multi-threaded, but the core of command processing (memory operations + IO multiplexing) is still single-threaded, which is the reason why it is fast:

  1. Pure memory operation: Reading and writing are all done in memory, which is more than 100,000 times faster than disk IO
  2. IO multiplexing (epoll/kqueue): One thread can monitor thousands of client connections without opening a thread for each connection (avoiding context switching and lock competition)
  3. The core of command processing is lock-free: single-threaded operation of memory, no need to lock or unlock at all, so the efficiency soars

High-frequency data structures (don’t just talk about String!)

Redis commandCorresponding underlying structureCommon scenarios for Python development
SET/GETSimple dynamic string (SDS)Storing Token, caching popular articles, distributed locks
HSET/HGET/HGETALLCompressed list/hash tableStore user information and shopping cart
LPUSH/RPUSH/LPOP/RPOPCompressed list / doubly linked listMessage queue, real-time ranking list (short list only)
SADD/SISMEMBER/SINTERInteger collection/hash tableDuplication removal, common attention, lottery user pool
ZADD/ZRANGE/ZREVRANGECompression list/skip listReal-time hot list, delay queue

Persistence strategy (RDB vs AOF vs hybrid, the mainstream chooses hybrid!)

Pure memory operation has a fatal disadvantage: all data will be lost when power is turned off/restart, so it must be persisted to disk:

  1. RDB (snapshot): timing (such assave 900 1) Compress all the data in the memory into binary files and save them to the disk
  • ✅ Extremely fast recovery (suitable for backup/migration)
  • ❌ It is easy to lose data (at most, all data within the save interval will be lost)
  1. AOF (Append Log): Append each write command ** to the text file ** line by line (similar to MySQL's Binlog)
  • ✅ Extremely high data security (configurationappendfsync alwaysAlmost never lost)
  • ❌ Files are getting larger and larger, and recovery is getting slower and slower
  1. Hybrid persistence (Redis 4.0+, Python backend is recommended to be enabled by default): RDB head (full amount) + AOF tail (incremental), which has the advantages of fast recovery of RDB and the advantages of AOF data security.

4. Pessimistic locking vs optimistic locking

Scenario is king: read more and write less and choose optimism, write more and read less and choose pessimism!

Python development often encounters the problem of "multiple people changing the same inventory at the same time", and it is necessary to lock it at this time.

Pessimistic lock (lock first, use later, stable but slow)

-- MySQL InnoDB 语法,Python ORM 也有对应的方法
BEGIN;
-- 先给要改的库存行加排他锁(FOR UPDATE)
SELECT stock FROM products WHERE id=1 FOR UPDATE;
-- 业务逻辑:减库存
UPDATE products SET stock=stock-1 WHERE id=1;
COMMIT;
  • ✅ There will never be oversold conditions
  • ❌ The lock takes a long time and the concurrency performance is poor (only the person who gets the lock can operate it)
  • 🎯 Applicable scenarios: ticket grabbing, flash sale (more writing, less reading, extremely fierce concurrency competition)

Optimistic locking (assuming no one grabs it, check again when submitting, fast but conflicting)

-- 常用的版本号(Version)机制
BEGIN;
-- 先查库存和当前版本号
SELECT stock, version FROM products WHERE id=1;
-- 业务逻辑:假设查出来的 version=5,减1
UPDATE products SET stock=stock-1, version=version+1 WHERE id=1 AND version=5;
-- 检查受影响的行数:如果是0,说明版本号不对,冲突了,Python 里可以重试几次
COMMIT;
  • ✅ No locking, good concurrency performance
  • ❌ When the conflict rate is high, frequent retries are required, but it will be slower.
  • 🎯 Applicable scenarios: modifying user information, likes (read more and write less, low probability of conflict)

5. MVCC (Multiple Version Concurrency Control)

InnoDB’s secret weapon: non-blocking read to resolve “read-write” conflicts!

If there are only locks, "read-write" operations will also block each other, and the concurrency performance will be very poor. MVCC is designed to solve this problem.

Principle of human version

InnoDB adds two implicit columns to each row of data (no need to manually create them):

  1. Creation Timestamp: Record the transaction ID that creates/modifies this row of data
  2. Delete Timestamp: Record the transaction ID of deleting this row of data (NULL if not deleted)

At the same time, use Undo Log (rollback log) to save the historical version linked list of the data.

Two kinds of read operations (Python development should pay attention!)

  1. Snapshot read (ordinary SELECT, used in 99% of cases): No locking, reading the historical version in the Undo Log, not blocking write operations, nor being blocked by write operations
    SELECT * FROM users WHERE id=1; -- 快照读
  2. Current reading (INSERT/UPDATE/DELETE, or locked SELECT): What is being read is the latest version and needs to be locked.
    SELECT * FROM users WHERE id=1 FOR UPDATE; -- 排他锁的当前读
    SELECT * FROM users WHERE id=1 LOCK IN SHARE MODE; -- 共享锁的当前读

6. Storage engine: MyISAM vs InnoDB

For the Python backend, choose InnoDB without any thought! unless…

After MySQL 5.5, the default is InnoDB, but MyISAM is occasionally seen in old projects. Just memorize the comparison table clearly.

FeaturesMyISAMInnoDB
Transaction support❌ Not supportedRequired support
Lock granularityTable lock (poor writing performance, OK reading performance)Row lock (high concurrency)
Foreign Key Constraints❌ Not supportedSupported
Crash Recovery❌ Poor (easy to lose data)Strong (Utilize Redo Log + Undo Log)
count(*) speed✅ Extremely fast (has a built-in counter)❌ Slow (requires full table/index scan)

7. Elasticsearch (ES) search depth analysis

When MySQL's fuzzy search is so slow that it explodes, try ES!

ES is a distributed full-text retrieval server based on Lucene. It is commonly used in Python development for e-commerce product search, log analysis, and real-time monitoring.

Core killer move: Inverted Index (Inverted Index)

MySQL fuzzy search (LIKE "%手机%") requires a full table scan. ES relies on inverted index to increase the search speed by several orders of magnitude:

# 简化版正排索引 vs 倒排索引
【正排索引】:文档ID → 内容(单词列表)
1 → ["苹果", "手机", "新款"]
2 → ["华为", "手机", "Mate60"]
3 → ["苹果", "平板", "Pro"]

【倒排索引】:单词 → 文档ID列表(有序)
苹果 → [1, 3]
手机 → [1, 2]
新款 → [1]
华为 → [2]
Mate60 → [2]
平板 → [3]
Pro → [3]

For example, if you search for "Apple mobile phone", you can directly check the document ID list of "Apple" and "mobile phone" in the inverted index and find the intersection ([1]), which takes less than 1 second.


🚀 Follow-up suggestions for avoiding pitfalls

This chapter skips Redis and ES without going into too much detail. In the Python back-end interview, Redis's cache avalanche, penetration, and breakdown are "must-take questions among the must-take questions." I can write you a separate "Guide to avoid pitfalls of the three Redis brothers" with Python code examples. Do you want it?