SQL Detailed DML - Complete Guide to Data Manipulation Language
Introduction
Imagine you are an e-commerce back-end engineer - today the product department wants you to batch import 10,000 pieces of new data, the operations team is eager to clean up last year's expired coupons, and the order system has to handle millions of insertion operations every day. Behind these requirements, SQL's data manipulation language (DML) is required without exception.
DML is a basic skill that every back-end engineer, data analyst, and operation and maintenance personnel must master. It is responsible for adding, deleting, modifying, and checking data in the database table, and is the "hands and feet" of business logic. This article will in-depth dismantle the core usage of each command of SELECT, INSERT, UPDATE, and DELETE in an easy-to-understand manner, while comparing the common differences between MySQL and PostgreSQL, and helping you avoid several major pitfalls in the production environment.
1. DML basic literacy
1.1 Core Definition
DML (Data Manipulation Language, data manipulation language) mainly operates rows of data in the table, including four types of core commands:
- SELECT: query data (although sometimes called DQL independently, the industry generally refers to it together with DML)
- INSERT: Add new data to the table
- UPDATE: Modify existing data
- DELETE: delete data
DML only targets data content and does not change the table structure (changing the structure is a matter of DDL). Also, DML operations typically support transaction rollback, and the specific behavior depends on the database's autocommit settings.
1.2 The difference between DML and DDL
Many novices confuse DML and DDL. Here is a table to quickly clarify:
Simple understanding: **DML moves content, DDL moves skeleton. **
2. SELECT——Data query
For most systems, queries account for more than 80% of DML usage. Let’s start with the most basic query writing, and then look at the high-frequency features of MySQL and PostgreSQL.
2.1 General basic query
No matter which database you use, the following writing methods are common:
-- 1. 查询指定列(推荐只查需要的字段,减少网络传输和数据库压力)
SELECT order_id, user_id, total_amount, create_time
FROM orders;
-- 2. 查询所有列(仅限开发调试,生产环境禁止 SELECT * )
SELECT *
FROM employees;
-- 3. 使用别名让结果更易读
SELECT
u.user_name AS buyer,
o.order_id,
o.total_amount * 0.95 AS final_pay -- 别名也可以给计算字段
FROM orders o
JOIN users u ON o.user_id = u.user_id;
-- 4. 去重查询:找出有过订单的用户 ID
SELECT DISTINCT user_id
FROM orders;
-- 5. 多条件查询:2024年6月已完成且金额≥100的订单
SELECT *
FROM orders
WHERE create_time BETWEEN '2024-06-01 00:00:00' AND '2024-06-30 23:59:59'
AND pay_status = 'completed'
AND total_amount >= 100;
2.2 Differences between MySQL and PostgreSQL
In addition to basic syntax, the two databases each have special features that make people call them "easy to use".
MySQL Exclusive Tips
-- 1. 分页查询(直接用 LIMIT + OFFSET)
SELECT order_id, user_id, total_amount
FROM orders
ORDER BY create_time DESC
LIMIT 10 OFFSET 20; -- 跳过前 20 条,取 10 条
-- 2. 简易全文搜索(适合小型站内搜索)
SELECT article_id, title, publish_time
FROM articles
WHERE MATCH(title, content) AGAINST('SQL 性能优化' IN NATURAL LANGUAGE MODE);
PostgreSQL exclusive tips
-- 1. 窗口函数(不需要 GROUP BY 也能做排名、累计统计)
SELECT
order_id,
user_id,
total_amount,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY create_time DESC) AS user_latest_order_rank
FROM orders;
-- 2. 原生数组操作(非常适合存储标签、技能等数据)
SELECT employee_id, first_name, skills[1] AS primary_skill
FROM employee_profiles
WHERE 'Python' = ANY(skills);
-- 3. RETURNING 子句(在增删改后直接返回操作的行,不用再查一次)
WITH deleted_inactive_users AS (
DELETE FROM users
WHERE last_login < '2023-01-01'
RETURNING user_id
)
DELETE FROM user_profiles
WHERE user_id IN (SELECT user_id FROM deleted_inactive_users);
PostgreSQL's window functions and RETURNING functions make it particularly powerful when processing complex data.
3. INSERT - data insertion
3.1 Basic operations
-- 单行插入
INSERT INTO products (product_name, category, price, stock)
VALUES ('无线蓝牙耳机', '电子数码', 199.00, 200);
-- 批量插入(生产效率至少提升几十倍)
INSERT INTO products (product_name, category, price, stock)
VALUES
('机械键盘青轴', '电子数码', 399.00, 150),
('棉麻短袖T恤', '服饰', 79.00, 500),
('不锈钢保温杯', '家居', 49.00, 300);
3.2 Conflict handling: insert or update
When the inserted data may conflict with the primary key or unique index, we often need to "update if there is any, insert if not", which is the UPSERT operation. The writing methods of the two databases are very different.
MySQL:ON DUPLICATE KEY UPDATE
INSERT INTO products (product_id, product_name, price, stock)
VALUES (101, '无线蓝牙耳机升级款', 229.00, 250)
ON DUPLICATE KEY UPDATE
product_name = VALUES(product_name), -- 使用新值覆盖
price = VALUES(price),
stock = stock + VALUES(stock); -- 结合旧值做加法
PostgreSQL:ON CONFLICT
-- 方式1:冲突时直接跳过
INSERT INTO products (product_id, product_name, price, stock)
VALUES (101, '无线蓝牙耳机升级款', 229.00, 250)
ON CONFLICT (product_id)
DO NOTHING;
-- 方式2:冲突时更新
INSERT INTO products (product_id, product_name, price, stock)
VALUES (101, '无线蓝牙耳机升级款', 229.00, 250)
ON CONFLICT (product_id)
DO UPDATE SET
product_name = EXCLUDED.product_name, -- EXCLUDED 代表试图插入的那一行
price = EXCLUDED.price,
stock = products.stock + EXCLUDED.stock;
PostgreSQLON CONFLICT ... DO NOTHINGVery intuitive, while MySQL's approach is more compact, each in its own way.
4. UPDATE - data modification
4.1 General basic modifications
-- 1. 简单修改:给“服饰”类商品打 8 折
UPDATE products
SET price = price * 0.8
WHERE category = '服饰';
-- 2. 条件分支修改:按绩效发放年终奖
UPDATE employees
SET year_end_bonus = CASE
WHEN performance_level = 'S' THEN 100000
WHEN performance_level = 'A' THEN 50000
WHEN performance_level = 'B' THEN 20000
ELSE 10000
END
WHERE hire_date >= '2023-01-01';
4.2 Cross-table modification
When you need to update data based on the contents of another table, the writing method is different.
MySQL
-- 给“电子数码”部门所有员工涨薪 5%
UPDATE employees e
INNER JOIN departments d ON e.department_id = d.department_id
SET e.salary = e.salary * 1.05
WHERE d.department_name = '电子数码';
PostgreSQL
-- 同样需求,使用 FROM 子句,并可以用 RETURNING 返回结果
UPDATE employees
SET salary = employees.salary * 1.05
FROM departments
WHERE employees.department_id = departments.department_id
AND departments.department_name = '电子数码'
RETURNING employee_id, first_name, salary;
5. DELETE —— Data deletion
5.1 Common basic deletion
-- 1. 删除所有已取消的订单
DELETE FROM orders
WHERE order_status = 'cancelled';
-- 2. 清空整张表(保留表结构,速度极快)
TRUNCATE TABLE temp_cart_data;
5.2 Cross-table deletion
Likewise, MySQL uses JOIN and PostgreSQL uses USING.
MySQL
-- 删除“已关闭部门”下的所有员工
DELETE e FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id
WHERE d.department_status = 'closed';
PostgreSQL
-- 删除“已关闭部门”下的所有员工,并返回被删记录
DELETE FROM employees
USING departments
WHERE employees.department_id = departments.department_id
AND departments.department_status = 'closed'
RETURNING employee_id, first_name;
5.3 Deleting large tables in batches (preventing table locking)
When you want to delete millions of rows of data, you must not delete oneDELETEDirectly execute ** in full, otherwise the entire table may be locked.
-- MySQL:循环删除 2022 年之前的日志,每次只删 1 万条
DELETE FROM logs
WHERE log_time < '2022-01-01 00:00:00'
LIMIT 10000;
-- 写一个简单脚本,循环执行直到受影响行数为 0
PostgreSQL NoLIMITDelete syntax, but can be batched by subquery with primary key:
DELETE FROM logs
WHERE log_id IN (
SELECT log_id FROM logs
WHERE log_time < '2022-01-01 00:00:00'
LIMIT 10000
);
6. MySQL vs PostgreSQL DML Differences Quick Look
7.1 Query optimization
-- 1. 使用 EXPLAIN 分析执行计划
-- MySQL
EXPLAIN SELECT * FROM orders WHERE user_id = 123 AND pay_status = 'completed';
-- PostgreSQL(ANALYZE 会真实执行并给出实际耗时)
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123 AND pay_status = 'completed';
-- 2. 永远不要用 SELECT * 获取所有字段,只拿你需要的
-- 3. 为 WHERE、JOIN、ORDER BY 的高频字段建立索引
CREATE INDEX idx_orders_user_pay ON orders(user_id, pay_status);
7.2 Batch operation optimization
-- 1. 批量插入 > 逐行循环插入
-- 2. MySQL 大批量插入时,可以暂时关闭自动提交,集中一次性提交
SET autocommit = 0;
START TRANSACTION;
-- ... 批量插入
COMMIT;
SET autocommit = 1;
-- 3. PostgreSQL 用 COPY 命令从 CSV 文件导入,速度最快
-- COPY products FROM '/path/to/products.csv' WITH (FORMAT csv, HEADER true);
1. Before executing DELETE, TRUNCATE or large batch UPDATE, be sure to make a backup first! **
2. For time-consuming DML operations, try to arrange them to be executed during off-peak periods.
Summarize
DML is the bridge connecting business and database, and is the "basic skill" of every back-end engineer. MySQL and PostgreSQL have their own characteristics in DML:
- MySQL: Simple syntax, outstanding batch insertion performance, suitable for rapid development of small and medium-sized applications.
- PostgreSQL: Strong standards compatibility, built-in advanced features such as window functions, arrays, RETURNING, etc., suitable for data-intensive and analytical scenarios.
By mastering two sets of common writing methods, combined with index planning and batch execution strategies, you can write efficient and safe DML statements to easily cope with various business needs.