Index Optimization Strategy - MySQL and PostgreSQL Index Design and Optimization Guide

Introduction

Last week, we investigated an e-commerce order system and reduced the time it took to check "Guangzhou orders not shipped for 7 days" during the peak period from 4 minutes to 220ms - not by upgrading the SSD, but by deleting three duplicate indexes and adding an accurate compound coverage index**.

Indexing is the "first productivity" of database performance, but it is also a double-edged sword: it can greatly speed up queries, but it can slow down writing and occupy additional disk space. This article helps you quickly master the core index design skills of two mainstream relational-databases, MySQL and PostgreSQL, so that the index can truly serve your query mode.


1. Index Basic Literacy (Quick Review)

1.1 Core Analogy

The database index is like library's spine classification + search catalog by book title/author, you can quickly locate the book you want without going through all the bookshelves.

1.2 Four Core Principles

Keep these four points in mind first, and all subsequent strategies are extensions of them:

  1. Prioritize indexing on highly selective columns The more selective it is (the more different values ​​it has), the better the indexing will be. User email (almost unique) is better than order status (maybe only 3~4 values).
  2. Compound index must follow the "leftmost prefix" Index keys are organized sequentially, just like a directory sorted first by "province" and then by "city". If you skip the province and search the city directly, the index will not take effect.
  3. Write more and read less, build indexes carefully Every time you write, update, or delete, the data of all related indexes must be modified simultaneously. Too many indexes on a table with high write pressure will become a disaster.
  4. Regular maintenance of index health Delete indexes that are no longer used and rebuild severely fragmented indexes to maintain overall efficiency.

2. MySQL vs PostgreSQL core index comparison

FeaturesMySQLPostgreSQL
Default IndexB-Tree (applicable to most scenarios)B-Tree (same as above)
Featured indexFull-text index (InnoDB 5.7+), spatial index (5.7.5+)Hash index (pure equivalent query is extremely fast), GiST/GIN (array/JSON/full-text/geometry), BRIN (sequential big data, extremely space-saving)
Advanced featuresInvisible index (8.0+), function index (8.0.13+)Native partial index, INCLUDE covering index (11+), concurrent index creation (avoiding table locks)

Simple memory: MySQL’s index ecology is relatively traditional but solid; PostgreSQL provides richer and more accurate index types, which can handle complex semi-structured data and large table scenarios.


3. Practical index design practice (key point!)

3.1 High-frequency query scenarios

Scenario 1: Compound condition + sorting + overwriting

High-frequency query of order system:

-- 需求:根据用户ID和订单状态查询,按订单时间倒序,只取ID和金额
SELECT id, amount FROM orders
WHERE user_id = 1 AND status = 'paid'
ORDER BY order_date DESC
LIMIT 10;

MySQL Solution Because there is no native INCLUDE clause, we need to include the columns that need to be returned in the composite index to achieve covering index.

CREATE INDEX idx_usr_stat_date ON orders(user_id, status, order_date DESC, id, amount);

Index keys are arranged in the order of equivalent query columns, range/sort columns, and coverage columns. Such an index can completely cover the query without having to return the table.

PostgreSQL solution Use the native INCLUDE clause to include id and amount in the index as non-key columns, which will not participate in the index search logic, but can avoid table backing.

CREATE INDEX idx_usr_stat_date_pg ON orders(user_id, status, order_date DESC)
INCLUDE (id, amount);

Leftmost prefix verification: Both solutions can accurately utilize the index and will appear in the query planUsing indexorIndex Only Scan

Scenario 2: Query only a small range of data in a specific state

Pay close attention to queries such as "Return Review Forms Not Shipped in the Past 30 Days" that only care about a small set of data.

MySQL Solution MySQL does not support native partial indexes, but you can reduce the range through function indexes (8.0.13+), or use ordinary indexes with range queries.

-- 利用函数索引变相缩小范围
CREATE INDEX idx_ret_pending_30d ON orders((DATE(order_date)), status);
-- 查询时必须配合相应条件
SELECT * FROM orders
WHERE status = 'return_pending'
  AND DATE(order_date) > CURDATE() - INTERVAL 30 DAY;

PostgreSQL solution The native partial index only builds an index for rows that meet the WHERE condition, saving a lot of space, and the query is automatically matched.

CREATE INDEX idx_ret_pending_30d_pg ON orders(user_id, order_date DESC)
WHERE status = 'return_pending'
  AND order_date > CURRENT_DATE - INTERVAL '30 days';

It is suitable for high-frequency queries where only a small amount of data meets the conditions, and the index size is reduced by more than 90%.

Scenario 3: JSON/array query

MySQL Solution MySQL 5.7 begins to provide the JSON type, which can optimize partial path queries through virtual columns or function indexes, but its capabilities are limited.

-- 只能加速对 tags 数组第一个元素的查询
CREATE INDEX idx_tags_mysql ON products((JSON_EXTRACT(tags, '$[0]')));

PostgreSQL solution GIN index natively supports arrays and JSONB, and its query capabilities are much more powerful.

CREATE INDEX idx_tags_gin ON products USING gin(tags);
CREATE INDEX idx_profile_gin ON user_profiles USING gin(profile_data);

Query example:

-- 数组中包含指定元素
SELECT * FROM products WHERE tags @> ARRAY['electronics', 'wireless'];

-- JSONB 内包含特定键值
SELECT * FROM user_profiles
WHERE profile_data->'preferences' @> '{"theme": "dark"}';

If your business uses a lot of JSON or arrays, the advantages of PostgreSQL are very obvious.


4. Guide to avoid index pitfalls

4.1 Common index failure scenarios

-- ❌ 失效1:LIKE 前导通配符
SELECT * FROM users WHERE nickname LIKE '%张%';
-- ✅ 替代:全文索引(MySQL/PG 均有支持)

-- ❌ 失效2:对索引列使用函数或表达式
SELECT * FROM orders WHERE YEAR(order_date) = 2024;
-- ✅ 替代:用范围查询或函数索引
SELECT * FROM orders
WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';

-- ❌ 失效3:OR 连接不同索引列
SELECT * FROM users WHERE name = '张三' OR phone = '13800138000';
-- ✅ 替代:拆分为两个独立索引查询,用 UNION ALL 合并且避免重复

4.2 Three major anti-patterns

  • Excessive Indexing: Create a separate index for each column, which results in extreme expansion of write operations and skyrocketing storage space.
  • Key order misalignment: Put low-selectivity columns (such as status) at the front of the composite index, resulting in weak index filtering capabilities.
  • Never maintained: If the index fragmentation of a large table exceeds 30% and it is still not rebuilt, the scanning efficiency may not be as good as that of a full table scan.

5. Quick diagnostic tool: use EXPLAIN to see the index usage clearly

5.1 MySQL EXPLAIN

-- JSON 格式输出更直观(MySQL 5.6+)
EXPLAIN FORMAT=JSON
SELECT id, amount FROM orders
WHERE user_id = 1 AND status = 'paid'
ORDER BY order_date DESC
LIMIT 10;

Key information:

  • key:The actual index name used
  • rows: Estimated number of scanning lines, the smaller the better
  • Extra:AppearUsing indexRepresents a covering index, no need to return the table;Using filesortIndicates the need for additional sorting, usually a warning sign

5.2 PostgreSQL EXPLAIN

-- 使用 ANALYZE、BUFFERS 查看真实执行统计
EXPLAIN (ANALYZE, BUFFERS, COSTS OFF)
SELECT id, amount FROM orders
WHERE user_id = 1 AND status = 'paid'
ORDER BY order_date DESC
LIMIT 10;

Key information:

  • Scan type preference order: Index Only Scan > Index Scan > Bitmap Heap Scan > Seq Scan (full table scan)
  • Heap FetchesThe less the better, too many indicates that the number of table returns is high, and you can consider expanding the covering index.

Summarize

  1. Select index according to scenario and select characteristics according to database Ordinary queries are covered with B-Tree; JSON/array/full text is given priority by PG's GIN/GiST; time-series large tables use BRIN to compress the index space.
  2. The order of composite index keys must be particular Equivalence query column → range/sort column → cover column (or append individually via INCLUDE)
  3. Regularly inspect slow queries and clean up useless indexes Make good use of EXPLAIN and cooperate with the index statistics view that comes with the database to "slim down" in a timely manner.
The more indexes, the better! It’s not that the more indexes fit your query pattern, the better!