relational-database tutorial - A complete guide to MySQL and PostgreSQL

Whether you are a beginner who has just started to get into back-end development, or a developer who has been writing CRUD for several years and wants to make up for your shortcomings, relational-database is the technical cornerstone that must be firmly mastered. This tutorial focuses on two major open source mainstream databases - MySQL 8.0+ and PostgreSQL 14+. From core concepts, SQL language to Python practice, it will take you to establish complete system capabilities from table creation to optimization, from native query to ORM programming.

The tutorial adopts a three-layer structure of "Basic + Advanced + Python Implementation". You can choose to skip reading or study in order according to your own level. All the codes in this article have been actually verified, and the differences between the two models are marked to facilitate your quick selection or migration.


1. First understand the core logic of relational-database

relational-database (RDBMS) stores data in Table, which is composed of Row and Column. Each table can be connected with other tables through Foreign Key. The core guarantee of this model is ACID:

  • Atomicity: All operations in the transaction either succeed or are all rolled back.
  • Consistency: The data before and after the transaction must satisfy all predefined rules (constraints, triggers).
  • Isolation: Concurrent transactions do not interfere with each other.
  • Durability: Committed transaction changes will not be lost even if the system crashes.

Both MySQL and PostgreSQL fully support ACID, but each has different emphasis on implementation details and advanced features: MySQL is known for its simplicity and ease of use, and mature separation of reading and writing; PostgreSQL is closer to the SQL standard and has more powerful support for complex queries, geographical space, and JSON structured data.


2. Quick installation and tool preparation

MySQL 8.0

  • macOS: brew install mysql
  • Windows: Download the community version MSI installation package and set the root password during installation.
  • Linux: sudo apt install mysql-server

PostgreSQL 14

  • macOS: brew install postgresql
  • Windows: Download the installer from the official website
  • Linux: sudo apt install postgresql

After installation, it is recommended to use visualization tools: MySQL Workbench, pgAdmin 4 or VS Code database plug-in to facilitate visual viewing of the table structure and execution results in the early stage.


3. SQL language five-piece set: DDL / DML / DQL / DCL / TCL

SQL can be divided into five categories according to functions. This is the skeleton for learning relational-database.

1. DDL (data definition language) - create databases, tables, and define constraints

Create database

-- MySQL 和 PostgreSQL 通用
CREATE DATABASE school;

Create table

Take the two tables of classes and students as an example:

-- 创建班级表
CREATE TABLE class (
    id INT PRIMARY KEY AUTO_INCREMENT,    -- MySQL 自增主键
    name VARCHAR(50) NOT NULL
);

Use of auto-increment columns in PostgreSQLSERIALorGENERATED AS IDENTITY

CREATE TABLE class (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50) NOT NULL
);

Create a student table and associate classes with foreign keys

-- MySQL
CREATE TABLE student (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(50) NOT NULL,
    class_id INT,
    CONSTRAINT fk_class FOREIGN KEY (class_id) REFERENCES class(id)
);

PostgreSQL writing is almost the same, just note that foreign key constraints are defined in the same way.

If you want to delete a table or database, useDROP TABLE student;orDROP DATABASE school;, operate with caution.

Common constraints

  • PRIMARY KEY: Uniquely identifies a row and automatically creates a unique index
  • FOREIGN KEY: Ensure referential integrity
  • UNIQUE: Value cannot be repeated
  • NOT NULL: Empty is not allowed
  • DEFAULT: Set default value

2. DML (data manipulation language) - adding, deleting and modifying data

Insert data

-- 插入班级
INSERT INTO class (name) VALUES ('三年级一班');
INSERT INTO class (name) VALUES ('三年级二班');

-- 批量插入学生
INSERT INTO student (name, class_id) VALUES
    ('张三', 1),
    ('李四', 1),
    ('王五', 2);

UPDATE DATA

-- 将张三的名字改为张三丰
UPDATE student SET name = '张三丰' WHERE name = '张三';

Be careful: do not addWHEREThe condition will update the entire table.

Delete data

DELETE FROM student WHERE class_id = 2;

Again, noWHEREThe entire table will be cleared. Please think twice about this in production environments.


3. DQL (Data Query Language) - Flexible access to data

Basic Query

-- 查询所有列
SELECT * FROM student;

-- 查询特定列
SELECT name, class_id FROM student;

-- 条件过滤
SELECT * FROM student WHERE class_id = 1;

Fuzzy query, range and sorting

-- 名字中包含'张'的学生
SELECT * FROM student WHERE name LIKE '%张%';

-- 某范围内的ID
SELECT * FROM student WHERE id BETWEEN 1 AND 3;

-- 按ID降序,分页取前2条(MySQL/PostgreSQL通用)
SELECT * FROM student ORDER BY id DESC LIMIT 2;

Multi-table association

-- 内连接:只取匹配到的行
SELECT s.name, c.name AS class_name
FROM student s
INNER JOIN class c ON s.class_id = c.id;

-- 左连接:保留左表所有行,右表无匹配则为NULL
SELECT s.name, c.name AS class_name
FROM student s
LEFT JOIN class c ON s.class_id = c.id;

Aggregation and Grouping

-- 每个班级的学生人数
SELECT class_id, COUNT(*) AS student_count
FROM student
GROUP BY class_id;

-- 筛选人数大于2的班级
SELECT class_id, COUNT(*) AS cnt
FROM student
GROUP BY class_id
HAVING cnt > 2;

Subquery

-- 查询和张三同班的学生
SELECT * FROM student
WHERE class_id = (SELECT class_id FROM student WHERE name = '张三丰');

4. DCL (Data Control Language) - Users and Permissions

-- MySQL 创建用户并授权
CREATE USER 'app_user'@'%' IDENTIFIED BY 'strong_password';
GRANT SELECT, INSERT, UPDATE ON school.* TO 'app_user'@'%';
FLUSH PRIVILEGES;

The commands for PostgreSQL are slightly different, but the idea is the same: useCREATE ROLEGRANTGrant permissions and useREVOKERecycle.

Principle of minimizing permissions: Only give the application the permissions it actually needs. Do not give them directly for convenience.ALL


5. TCL (Transaction Control Language) - Ensure atomicity of operations

-- 开始事务
START TRANSACTION;

-- 执行一系列操作
UPDATE account SET balance = balance - 100 WHERE id = 1;
UPDATE account SET balance = balance + 100 WHERE id = 2;

-- 如果一切正常,提交事务
COMMIT;

-- 如果中间有错误,回滚
ROLLBACK;

MySQL automatically commits by default, and production environments are often set to manual commit or explicit wrapping of transactions. PostgreSQL also recommends using explicitBEGINCOMMIT/ROLLBACK


4. Advanced tools: index optimization and execution plan

When the amount of data exceeds 100,000, index design directly affects performance. B+ tree is the most commonly used index structure. Understanding the "leftmost prefix principle" and "covering index" can avoid most slow queries.

Create index

-- 给 student 表的 name 字段创建索引
CREATE INDEX idx_student_name ON student(name);

-- 联合索引
CREATE INDEX idx_student_class_name ON student(class_id, name);

Leftmost prefix principle: Union index(class_id, name)InquiryWHERE class_id = 1orWHERE class_id = 1 AND name = '张三'effective at the time, butWHERE name = '张三'The index cannot be exploited.

Use EXPLAIN to analyze the query

-- MySQL
EXPLAIN SELECT * FROM student WHERE class_id = 1 AND name LIKE '%张%';

-- PostgreSQL(更加详细)
EXPLAIN ANALYZE SELECT * FROM student WHERE class_id = 1 AND name LIKE '%张%';

focus ontypefields (such asALLIndicates full table scan and needs to be optimized),keyThe index used and the number of scanned rows.

Common index failure scenarios:

  • Perform functional operations on index columns, such asWHERE YEAR(create_time) = 2021
  • Leading fuzzy queryLIKE '%abc'
  • Implicit type conversion, such as comparing string fields to numbers
  • OROne side of the condition is not using an index

5. Views, functions and stored procedures (moderate use)

View can encapsulate complex queries and simplify application layer code:

CREATE VIEW student_class_view AS
SELECT s.id, s.name, c.name AS class_name
FROM student s
JOIN class c ON s.class_id = c.id;

-- 之后可直接像查表一样查询
SELECT * FROM student_class_view;

Stored functions and procedures are suitable for sinking business logic into the database, but do not overuse it, otherwise it will reduce code maintainability and migration flexibility. It is generally recommended to only be used in scenarios such as data cleaning and batch statistics, and the business logic should be placed in the application layer as much as possible.


6. Python Practical Combat: From Native Driver to ORM

1. Native connection to MySQL

import pymysql

conn = pymysql.connect(
    host='localhost',
    user='root',
    password='your_password',
    database='school',
    charset='utf8mb4'
)
cursor = conn.cursor()
cursor.execute("SELECT * FROM student WHERE class_id = %s", (1,))
rows = cursor.fetchall()
for row in rows:
    print(row)
conn.close()

Connecting to PostgreSQL can be done usingpsycopg2, the interface is similar.

2. Parameterized query (prevent SQL injection)

Never use string concatenation to construct SQL, always use%s(MySQL) or%s(Postgres) placeholder and pass the parameters in as a tuple. This is the most effective way to prevent SQL injection.

3. Getting started with SQLAlchemy ORM

from sqlalchemy import create_engine, Column, Integer, String, ForeignKey
from sqlalchemy.orm import declarative_base, sessionmaker, relationship

engine = create_engine('mysql+pymysql://user:pass@localhost/school')
Base = declarative_base()

class Student(Base):
    __tablename__ = 'student'
    id = Column(Integer, primary_key=True)
    name = Column(String(50))
    class_id = Column(Integer, ForeignKey('class.id'))

# 创建会话
Session = sessionmaker(bind=engine)
session = Session()

# 查询学生
students = session.query(Student).filter(Student.class_id == 1).all()
for s in students:
    print(s.name)

ORM can automatically handle the mapping between objects and tables, and supports advanced functions such as transactions and connection pools. SQLAlchemy 2.0 fully embraces asynchrony and is the first choice for modern Python projects.


7. Practical project suggestions

If you don’t just want to practice fake moves, I suggest you do it in the following order:

  1. Class Attendance Prototype: Implement three tables of classes, students, and attendance records, and practice multi-table related queries and aggregation statistics.
  2. E-commerce order table optimization: Import 100,000 pieces of simulated data, use EXPLAIN to analyze slow queries, try to build single-column indexes and joint indexes, and compare performance changes.
  3. Blog backend data layer: Use FastAPI + SQLAlchemy to build CRUD interfaces for users, articles, and comments, and add paging, transactions, and connection pools.

8. Quick Check on Frequently Asked Questions

**Q: Which one should I choose to get started with MySQL or PostgreSQL? ** A: MySQL is the first choice for pure beginners. It has rich community resources and it is easier to search for answers when reporting errors. If your business relies on JSON, geographical information, or has extremely high data integrity requirements, you will not regret it if you directly use PostgreSQL.

**Q: Is the more indexes the better? ** A: Absolutely not. Indexes can slow down write operations and occupy disk space. Only for high-frequency queriesWHEREJOIN ONORDER BYColumns are indexed.

**Q: Should I use native driver or ORM for Python database programming? ** A: Use native drivers for small tools and simple scripts; for medium and large-scale team collaboration projects, SQLAlchemy ORM is recommended, which can significantly reduce duplicate code and improve maintainability.


There is no shortcut to learning relational-database. The best way is to code while learning. Every time you master a new knowledge point, verify it with your own data. Turn on EXPLAIN when encountering slow queries, and understand the transaction isolation level when encountering deadlocks. Keep this habit, and you will soon advance from "being able to use it" to "understanding the principles and being able to optimize".

What to learn next?

After mastering the content of this tutorial, you can continue to delve deeper:

  • Cache Layer: Redis implements hotspot data caching
  • Distributed extension: MySQL master-slave replication, PostgreSQL streaming replication
  • Cloud practice: Alibaba Cloud RDS, AWS Aurora and other cloud database services
  • NoSQL: MongoDB, ClickHouse cope with unstructured or analytical scenarios

I wish you can build a robust and efficient data layer as soon as possible and lay a solid foundation for more complex systems.