SQL详解之DQL - 数据查询语言完整指南

引言

数据查询语言(Data Query Language,DQL)是SQL语言中最重要和最复杂的部分,主要用于从数据库中检索数据。DQL以SELECT语句为核心,配合各种子句和操作符,能够实现从简单数据检索到复杂数据分析的各种查询需求。本文将详细介绍DQL的各种功能,并对比MySQL和PostgreSQL在查询实现上的差异。

1. DQL基础概念

1.1 什么是DQL?

DQL(Data Query Language)是SQL语言中用于查询数据库中数据的部分,主要由SELECT语句构成。DQL的核心功能包括:

  • 数据检索:从一个或多个表中获取数据
  • 数据过滤:使用WHERE子句筛选数据
  • 数据排序:使用ORDER BY子句排序结果
  • 数据聚合:使用聚合函数计算统计数据
  • 数据分组:使用GROUP BY子句分组数据

1.2 SELECT语句基本结构

SELECT [DISTINCT] select_list
FROM table_source
[WHERE search_condition]
[GROUP BY group_by_expression]
[HAVING search_condition]
[ORDER BY order_expression [ASC | DESC]]
[LIMIT number_rows];

2. 基础查询操作

2.1 简单查询

MySQL和PostgreSQL通用语法:

-- 基本查询
SELECT employee_id, first_name, last_name, salary
FROM employees;

-- 查询所有列
SELECT * FROM employees;

-- 使用列别名
SELECT 
    employee_id AS emp_id,
    CONCAT(first_name, ' ', last_name) AS full_name,
    salary AS monthly_salary
FROM employees;

-- 去重查询
SELECT DISTINCT department_id
FROM employees;

-- 条件查询
SELECT employee_id, first_name, last_name
FROM employees
WHERE salary > 50000;

-- 多条件查询
SELECT employee_id, first_name, last_name, salary
FROM employees
WHERE salary BETWEEN 40000 AND 80000
AND hire_date >= '2020-01-01';

2.2 模糊查询

MySQL模糊查询:

-- LIKE操作符
SELECT employee_id, first_name, last_name
FROM employees
WHERE first_name LIKE 'J%';  -- 以J开头的名字

-- 通配符使用
SELECT employee_id, first_name, last_name
FROM employees
WHERE last_name LIKE '_o%';  -- 第二个字母是o的名字

-- REGEXP正则表达式(MySQL特有)
SELECT employee_id, first_name, last_name
FROM employees
WHERE first_name REGEXP '^[A-M]';  -- 以A-M开头的名字

-- MySQL全文搜索
SELECT article_id, title, content
FROM articles
WHERE MATCH(title, content) AGAINST('database' IN BOOLEAN MODE);

PostgreSQL模糊查询:

-- LIKE操作符
SELECT employee_id, first_name, last_name
FROM employees
WHERE first_name LIKE 'J%';

-- ILIKE(不区分大小写)
SELECT employee_id, first_name, last_name
FROM employees
WHERE first_name ILIKE 'j%';

-- 正则表达式(PostgreSQL特有)
SELECT employee_id, first_name, last_name
FROM employees
WHERE first_name ~ '^[A-M]';  -- 区分大小写
WHERE first_name ~* '^[a-m]'; -- 不区分大小写

-- PostgreSQL全文搜索
SELECT article_id, title, content
FROM articles
WHERE to_tsvector('english', title || ' ' || content) @@ to_tsquery('english', 'database');

2.3 排序和限制

MySQL排序和限制:

-- 基本排序
SELECT employee_id, first_name, last_name, salary
FROM employees
ORDER BY salary DESC;

-- 多列排序
SELECT employee_id, first_name, last_name, salary
FROM employees
ORDER BY department_id ASC, salary DESC;

-- 限制结果数量
SELECT employee_id, first_name, last_name, salary
FROM employees
ORDER BY salary DESC
LIMIT 10;

-- 分页查询
SELECT employee_id, first_name, last_name, salary
FROM employees
ORDER BY employee_id
LIMIT 10 OFFSET 20;  -- 跳过前20条,取10条

PostgreSQL排序和限制:

-- 基本排序
SELECT employee_id, first_name, last_name, salary
FROM employees
ORDER BY salary DESC;

-- 多列排序
SELECT employee_id, first_name, last_name, salary
FROM employees
ORDER BY department_id ASC, salary DESC;

-- 限制结果数量
SELECT employee_id, first_name, last_name, salary
FROM employees
ORDER BY salary DESC
LIMIT 10;

-- 分页查询
SELECT employee_id, first_name, last_name, salary
FROM employees
ORDER BY employee_id
LIMIT 10 OFFSET 20;

-- PostgreSQL特有:FETCH FIRST语法
SELECT employee_id, first_name, last_name, salary
FROM employees
ORDER BY salary DESC
FETCH FIRST 10 ROWS ONLY;

3. 高级查询技术

3.1 聚合函数

MySQL聚合函数:

-- 基本聚合
SELECT 
    COUNT(*) AS total_employees,
    AVG(salary) AS avg_salary,
    MIN(salary) AS min_salary,
    MAX(salary) AS max_salary,
    SUM(salary) AS total_payroll
FROM employees;

-- 条件聚合
SELECT 
    department_id,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary,
    MAX(CASE WHEN status = 'active' THEN salary END) AS max_active_salary
FROM employees
GROUP BY department_id;

-- MySQL特有:GROUP_CONCAT
SELECT 
    department_id,
    COUNT(*) AS emp_count,
    GROUP_CONCAT(CONCAT(first_name, ' ', last_name) SEPARATOR '; ') AS employee_names
FROM employees
GROUP BY department_id;

PostgreSQL聚合函数:

-- 基本聚合
SELECT 
    COUNT(*) AS total_employees,
    AVG(salary) AS avg_salary,
    MIN(salary) AS min_salary,
    MAX(salary) AS max_salary,
    SUM(salary) AS total_payroll
FROM employees;

-- 条件聚合
SELECT 
    department_id,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary,
    MAX(CASE WHEN status = 'active' THEN salary END) AS max_active_salary
FROM employees
GROUP BY department_id;

-- PostgreSQL特有:STRING_AGG
SELECT 
    department_id,
    COUNT(*) AS emp_count,
    STRING_AGG(first_name || ' ' || last_name, '; ') AS employee_names
FROM employees
GROUP BY department_id;

-- PostgreSQL高级聚合
SELECT 
    department_id,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary,
    MODE() WITHIN GROUP (ORDER BY status) AS common_status
FROM employees
GROUP BY department_id;

3.2 分组查询

MySQL分组查询:

-- 基本分组
SELECT 
    department_id,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;

-- 多列分组
SELECT 
    department_id,
    status,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id, status;

-- HAVING子句
SELECT 
    department_id,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 5 AND AVG(salary) > 60000;

-- MySQL分组扩展
SELECT 
    department_id,
    COUNT(*) AS emp_count,
    GROUP_CONCAT(first_name ORDER BY salary DESC SEPARATOR ', ') AS top_earners
FROM employees
GROUP BY department_id
WITH ROLLUP;  -- 显示汇总行

PostgreSQL分组查询:

-- 基本分组
SELECT 
    department_id,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;

-- 多列分组
SELECT 
    department_id,
    status,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id, status;

-- HAVING子句
SELECT 
    department_id,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 5 AND AVG(salary) > 60000;

-- PostgreSQL分组扩展
-- CUBE(所有可能的组合)
SELECT 
    department_id,
    status,
    COUNT(*) AS emp_count
FROM employees
GROUP BY CUBE (department_id, status);

-- ROLLUP(层次化汇总)
SELECT 
    department_id,
    status,
    COUNT(*) AS emp_count
FROM employees
GROUP BY ROLLUP (department_id, status);

-- GROUPING SETS(自定义分组组合)
SELECT 
    department_id,
    status,
    COUNT(*) AS emp_count
FROM employees
GROUP BY GROUPING SETS ((department_id), (status), ());

4. 连接查询

4.1 内连接

-- 内连接(MySQL和PostgreSQL通用)
SELECT 
    e.employee_id,
    e.first_name,
    e.last_name,
    d.department_name
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id;

-- 多表连接
SELECT 
    e.first_name,
    e.last_name,
    d.department_name,
    p.project_name,
    a.assignment_date
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id
INNER JOIN assignments a ON e.employee_id = a.employee_id
INNER JOIN projects p ON a.project_id = p.project_id;

4.2 外连接

MySQL外连接:

-- 左外连接
SELECT 
    e.first_name,
    e.last_name,
    COALESCE(d.department_name, 'No Department') AS dept_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id;

-- 右外连接
SELECT 
    e.first_name,
    e.last_name,
    d.department_name
FROM employees e
RIGHT JOIN departments d ON e.department_id = d.department_id;

-- 全外连接(MySQL不直接支持,需使用UNION)
SELECT e.first_name, e.last_name, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
UNION
SELECT e.first_name, e.last_name, d.department_name
FROM employees e
RIGHT JOIN departments d ON e.department_id = d.department_id;

PostgreSQL外连接:

-- 左外连接
SELECT 
    e.first_name,
    e.last_name,
    COALESCE(d.department_name, 'No Department') AS dept_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id;

-- 右外连接
SELECT 
    e.first_name,
    e.last_name,
    d.department_name
FROM employees e
RIGHT JOIN departments d ON e.department_id = d.department_id;

-- 全外连接(PostgreSQL支持)
SELECT 
    e.first_name,
    e.last_name,
    d.department_name
FROM employees e
FULL OUTER JOIN departments d ON e.department_id = d.department_id;

4.3 交叉连接和自连接

-- 交叉连接(笛卡尔积)
SELECT e1.first_name, e1.last_name, e2.first_name AS colleague_first_name
FROM employees e1
CROSS JOIN employees e2
WHERE e1.employee_id != e2.employee_id
AND e1.department_id = e2.department_id;

-- 自连接(查找同事)
SELECT 
    e1.first_name AS employee_name,
    e2.first_name AS manager_name
FROM employees e1
LEFT JOIN employees e2 ON e1.manager_id = e2.employee_id;

5. 子查询和CTE

5.1 标量子查询

-- 标量子查询(MySQL和PostgreSQL通用)
SELECT 
    employee_id,
    first_name,
    last_name,
    salary,
    (SELECT AVG(salary) FROM employees) AS company_avg_salary,
    salary - (SELECT AVG(salary) FROM employees) AS salary_diff
FROM employees;

-- 相关子查询
SELECT 
    employee_id,
    first_name,
    last_name,
    salary,
    (SELECT AVG(salary) 
     FROM employees e2 
     WHERE e2.department_id = e1.department_id) AS dept_avg_salary
FROM employees e1;

5.2 表子查询

-- 表子查询
SELECT 
    dept_info.department_name,
    dept_info.avg_salary,
    dept_info.emp_count
FROM (
    SELECT 
        d.department_name,
        AVG(e.salary) AS avg_salary,
        COUNT(e.employee_id) AS emp_count
    FROM departments d
    LEFT JOIN employees e ON d.department_id = e.department_id
    GROUP BY d.department_id, d.department_name
) AS dept_info
WHERE dept_info.emp_count > 0;

5.3 公用表表达式(CTE)

MySQL CTE(8.0+支持):

-- 简单CTE
WITH high_earners AS (
    SELECT employee_id, first_name, last_name, salary
    FROM employees
    WHERE salary > 75000
)
SELECT h.*, d.department_name
FROM high_earners h
JOIN departments d ON h.employee_id = d.manager_id;

-- 递归CTE(MySQL 8.0+)
WITH RECURSIVE org_chart AS (
    -- 基础查询:CEO
    SELECT employee_id, first_name, last_name, manager_id, 1 as level
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- 递归查询:下属
    SELECT e.employee_id, e.first_name, e.last_name, e.manager_id, oc.level + 1
    FROM employees e
    INNER JOIN org_chart oc ON e.manager_id = oc.employee_id
)
SELECT * FROM org_chart ORDER BY level, last_name;

PostgreSQL CTE:

-- 简单CTE
WITH high_earners AS (
    SELECT employee_id, first_name, last_name, salary
    FROM employees
    WHERE salary > 75000
)
SELECT h.*, d.department_name
FROM high_earners h
JOIN departments d ON h.employee_id = d.manager_id;

-- 递归CTE(PostgreSQL)
WITH RECURSIVE org_chart AS (
    -- 基础查询:CEO
    SELECT employee_id, first_name, last_name, manager_id, 1 as level
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- 递归查询:下属
    SELECT e.employee_id, e.first_name, e.last_name, e.manager_id, oc.level + 1
    FROM employees e
    INNER JOIN org_chart oc ON e.manager_id = oc.employee_id
)
SELECT * FROM org_chart ORDER BY level, last_name;

-- PostgreSQL特有:MATERIALIZED和NOT MATERIALIZED
WITH large_dataset AS MATERIALIZED (
    -- 这个CTE会被物化(计算一次并存储)
    SELECT * FROM very_large_table WHERE condition = 'value'
)
SELECT * FROM large_dataset WHERE other_condition = 'other_value';

6. 窗口函数

6.1 MySQL窗口函数(8.0+支持)

-- 排名函数
SELECT 
    employee_id,
    first_name,
    last_name,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
    RANK() OVER (ORDER BY salary DESC) AS rank_num,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank_num,
    NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;

-- 分区窗口函数
SELECT 
    employee_id,
    first_name,
    last_name,
    department_id,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dept_rank,
    AVG(salary) OVER (PARTITION BY department_id) AS dept_avg_salary
FROM employees;

-- 前后行函数
SELECT 
    employee_id,
    first_name,
    last_name,
    hire_date,
    LAG(hire_date) OVER (ORDER BY hire_date) AS prev_hire_date,
    LEAD(hire_date) OVER (ORDER BY hire_date) AS next_hire_date,
    hire_date - LAG(hire_date) OVER (ORDER BY hire_date) AS days_since_prev
FROM employees;

6.2 PostgreSQL窗口函数

-- 排名函数(与MySQL类似)
SELECT 
    employee_id,
    first_name,
    last_name,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
    RANK() OVER (ORDER BY salary DESC) AS rank_num,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank_num,
    PERCENT_RANK() OVER (ORDER BY salary) AS percent_rank
FROM employees;

-- 分布函数
SELECT 
    employee_id,
    first_name,
    last_name,
    salary,
    CUME_DIST() OVER (ORDER BY salary) AS cumulative_dist,
    PERCENT_RANK() OVER (ORDER BY salary) AS percent_rank,
    NTILE(100) OVER (ORDER BY salary) AS percentile
FROM employees;

-- 聚合窗口函数
SELECT 
    employee_id,
    first_name,
    last_name,
    department_id,
    salary,
    SUM(salary) OVER (PARTITION BY department_id) AS dept_total_salary,
    AVG(salary) OVER (PARTITION BY department_id) AS dept_avg_salary,
    MIN(salary) OVER (PARTITION BY department_id) AS dept_min_salary,
    MAX(salary) OVER (PARTITION BY department_id) AS dept_max_salary
FROM employees;

-- PostgreSQL特有:FIRST_VALUE和LAST_VALUE
SELECT 
    employee_id,
    first_name,
    last_name,
    salary,
    FIRST_VALUE(salary) OVER (ORDER BY hire_date) AS first_hired_salary,
    LAST_VALUE(salary) OVER (ORDER BY hire_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_hired_salary
FROM employees;

7. 高级查询技术

7.1 联合查询

-- UNION(去重)
SELECT employee_id, first_name, last_name FROM employees
WHERE department_id = 1
UNION
SELECT contractor_id, first_name, last_name FROM contractors
WHERE department_id = 1;

-- UNION ALL(不去重)
SELECT employee_id, first_name, last_name FROM employees
WHERE department_id = 1
UNION ALL
SELECT contractor_id, first_name, last_name FROM contractors
WHERE department_id = 1;

-- INTERSECT(交集,PostgreSQL支持)
SELECT employee_id FROM employees
INTERSECT
SELECT employee_id FROM high_performers;

-- EXCEPT(差集,PostgreSQL支持)
SELECT employee_id FROM employees
EXCEPT
SELECT employee_id FROM terminated_employees;

7.2 EXISTS和IN的使用

-- EXISTS(通常性能更好)
SELECT employee_id, first_name, last_name
FROM employees e
WHERE EXISTS (
    SELECT 1 FROM assignments a
    WHERE a.employee_id = e.employee_id
);

-- NOT EXISTS
SELECT employee_id, first_name, last_name
FROM employees e
WHERE NOT EXISTS (
    SELECT 1 FROM assignments a
    WHERE a.employee_id = e.employee_id
);

-- IN操作符
SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id IN (
    SELECT department_id FROM high_performance_depts
);

-- NOT IN(注意NULL值的影响)
SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id NOT IN (
    SELECT department_id FROM closed_depts
    WHERE department_id IS NOT NULL  -- 避免NULL值问题
);

8. 查询优化技巧

8.1 索引优化

-- 创建复合索引优化查询
CREATE INDEX idx_emp_dept_salary ON employees(department_id, salary);
CREATE INDEX idx_emp_hire_date ON employees(hire_date);

-- 使用索引提示(MySQL)
SELECT /*+ USE_INDEX(employees, idx_emp_dept_salary) */ 
    employee_id, first_name, last_name
FROM employees
WHERE department_id = 1 AND salary > 50000;

-- PostgreSQL索引优化
-- 查看查询计划
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM employees WHERE department_id = 1 AND salary > 50000;

8.2 查询重写技巧

-- 将IN转换为EXISTS(性能优化)
-- 不推荐
SELECT * FROM employees 
WHERE department_id IN (SELECT department_id FROM departments WHERE location = 'NYC');

-- 推荐
SELECT e.* FROM employees e
WHERE EXISTS (
    SELECT 1 FROM departments d
    WHERE d.department_id = e.department_id 
    AND d.location = 'NYC'
);

-- 将子查询转换为JOIN(性能优化)
-- 不推荐
SELECT * FROM employees 
WHERE salary > (SELECT AVG(salary) FROM employees);

-- 推荐
SELECT e.* FROM employees e
CROSS JOIN (SELECT AVG(salary) as avg_salary FROM employees) a
WHERE e.salary > a.avg_salary;

8.3 分页优化

MySQL分页优化:

-- 传统分页(大数据集性能差)
SELECT * FROM employees ORDER BY employee_id LIMIT 10000, 20;

-- 优化分页(使用游标)
SELECT * FROM employees 
WHERE employee_id > 10000 
ORDER BY employee_id 
LIMIT 20;

PostgreSQL分页优化:

-- 使用游标分页
SELECT * FROM employees 
WHERE employee_id > 10000 
ORDER BY employee_id 
LIMIT 20;

-- PostgreSQL特有:LATERAL JOIN优化
SELECT e.*, d.*
FROM (SELECT * FROM departments LIMIT 10) d
LATERAL (
    SELECT * FROM employees 
    WHERE department_id = d.department_id 
    LIMIT 5
) e;

9. MySQL与PostgreSQL DQL差异总结

特性MySQLPostgreSQL
全文搜索MATCH...AGAINSTto_tsvector/to_tsquery
正则表达式REGEXP~ / ~* 操作符
字符串聚合GROUP_CONCATSTRING_AGG
窗口函数8.0+支持原生支持
CTE8.0+支持原生支持
递归CTE8.0+支持原生支持
FULL OUTER JOIN不支持支持
INTERSECT/EXCEPT不支持支持
数组查询不支持原生支持

10. 性能优化最佳实践

10.1 查询设计优化

-- 1. 避免在WHERE子句中使用函数
-- 不好
SELECT * FROM employees WHERE YEAR(hire_date) = 2024;

-- 好
SELECT * FROM employees WHERE hire_date >= '2024-01-01' AND hire_date < '2025-01-01';

-- 2. 使用LIMIT限制结果集
SELECT * FROM employees ORDER BY hire_date DESC LIMIT 100;

-- 3. 合理使用索引
CREATE INDEX idx_dept_salary ON employees(department_id, salary);

-- 4. 优化JOIN顺序(将小表放在前面)
SELECT s.sale_id, s.amount, c.customer_name
FROM sales s
JOIN customers c ON s.customer_id = c.customer_id;  -- 假设customers表较小

10.2 统计信息更新

-- MySQL更新统计信息
ANALYZE TABLE employees;

-- PostgreSQL更新统计信息
ANALYZE employees;

相关教程

使用EXPLAIN分析查询计划,定期更新表统计信息,合理设计索引,避免在大数据集上使用SELECT *。

总结

DQL是SQL语言的核心,掌握各种查询技术和优化方法对于数据库开发至关重要。MySQL和PostgreSQL在DQL实现上各有特色:

  • MySQL:语法简洁,全文搜索功能强,8.0+版本增加了现代SQL特性
  • PostgreSQL:功能丰富,标准兼容性好,高级查询功能强大

在实际应用中,应该根据具体需求和数据库特性选择最适合的查询方法,并遵循性能优化最佳实践。理解两种数据库系统的差异有助于编写更高效的查询语句。