Database relationship design: one-to-many, many-to-many practice

📂 Stage: Stage 2 - Interaction and Data (Core) 🔗 Related chapters: SQLAlchemy ORM · 评论系统与交互


1. Sorting out relationship types

When designing a business data model, clarifying "who" and "who" are related is the basis for making the system scalable and easy to maintain. Entity relationships in the database are mainly divided into three categories:

graph LR
    A[用户 User] -->|一对多| B[文章 Post]
    C[文章 Post] -->|多对多| D[标签 Tag]
    A[用户 User] -->|一对一| E[用户资料 Profile]

You can think of them as real-world relationships: a user can write multiple articles (one-to-many), an article can be tagged with multiple tags, and a tag can be used by multiple articles (many-to-many), and a user will only have one private profile (one-to-one).

Core business correspondence shorthand

Relationship typesCommon business scenarios
One-to-Many One-to-ManyUser → Article, Article → Comment, Category → Article
Many-to-Many Many-to-ManyArticles → Tags, Courses → Students, Orders → Products
One-to-one One-to-OneUser→Private information, User→Avatar configuration

This article focuses on dismantling the most frequently used one-to-many and many-to-many, and easily solves the N+1 query performance pit that is most easily avoided.


2. One-to-many relationship practice: Classification → Article

One-to-many is the most common relationship: the "one" side holds a collection attribute (reverse relationship), and the "many" side holds foreign keys and object attributes that point directly to the "one" side (forward relationship).

Take Category and Post as an example: there can be multiple articles under a category, and each article only belongs to one category.

2.1 Complete model definition

# app/models.py
from datetime import datetime
from app.extensions import db

# ------------------- 一:分类 Category -------------------
class Category(db.Model):
    __tablename__ = "categories"
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(50), unique=True, nullable=False, index=True)  # 分类名,加索引加速查询
    slug = db.Column(db.String(50), unique=True, index=True)
    description = db.Column(db.String(200))

    # 反向引用:category.posts 可以拿到该分类下的所有文章
    # lazy="dynamic" 让它变成一个可继续过滤的查询对象,而不是一次性加载所有文章
    posts = db.relationship(
        "Post",
        back_populates="category",
        lazy="dynamic",
        cascade="save-update, delete-orphan"  # 保存更新级联,删除孤儿文章
    )

    def __repr__(self):
        return f"<Category {self.name}>"


# ------------------- 多:文章 Post -------------------
class Post(db.Model):
    __tablename__ = "posts"
    id = db.Column(db.Integer, primary_key=True)
    title = db.Column(db.String(200), nullable=False, index=True)
    content = db.Column(db.Text)
    is_published = db.Column(db.Boolean, default=False)
    created_at = db.Column(db.DateTime, default=datetime.utcnow, index=True)

    # 外键:指向 classifications 表的主键
    category_id = db.Column(db.Integer, db.ForeignKey("categories.id"), index=True)

    # 正向引用:post.category 直接拿到所属分类对象
    category = db.relationship("Category", back_populates="posts")

    def __repr__(self):
        return f"<Post {self.title[:20]}...>"

Design Points:

  • foreign keyscategory_idPut it in the "many" side table, it is recommended to add it explicitlyindex=True
  • back_populatesSynchronize the relationship between the two parties: set the articlecategorytime, classifiedpostsIt will also be updated automatically.
  • lazy="dynamic"Especially useful on reverse collections: to avoid loading too many articles at once, chain filtering, sorting, and paging are possible.

3. One-to-many core problem: query and N+1 optimization

# 正向查询:拿到文章所属的分类
post = Post.query.filter_by(title="Python入门").first()
print(post.category.name)  # 默认通过懒加载或 JOIN 获取分类

# 反向查询:拿到分类下所有已发布的文章,并按时间倒序
tech_cat = Category.query.filter_by(name="技术").first()
published_posts = (
    tech_cat.posts
    .filter_by(is_published=True)
    .order_by(Post.created_at.desc())
    .all()
)
for p in published_posts:
    print(p.title)

# 只想知道数量,不加载文章内容
comment_count = tech_cat.posts.count()

3.2 lazyHow to choose parameters?

lazyControlling the loading timing and return type of associated data directly affects performance:

lazyValueReturn typeWhen to loadApplicable scenarios
select(Default)Object/ListA separate SQL query is issued only when the attribute is accessed for the first timeThe amount of data is extremely small and the association is rarely accessed
dynamicQuery objectNot automatically loaded, needs to be manually loaded.all() .filter()The reverse set may be large and needs to be filtered and sorted
joinedObject/listAutomatic INNER JOIN preloading when querying the main tableOne-to-one/many-to-one forward small data volume association
selectinObject/listUse IN query to batch load related data after checking the main tableAny relationship, especially suitable for large data volumes and avoid complex JOIN

Tips: many-to-one positive correlation (such aspost.category) Usually the amount of data is small, you can use the defaultselect, or useselectinBatch loading in one IN query; reverse one-to-many collection (such ascategory.posts) It is recommended to use it directlydynamic

3.3 Fatal performance pit: N+1 query

❌ Wrong way of writing

# 获取 10 篇文章(1 次 SQL)
posts = Post.query.limit(10).all()
for post in posts:
    # 每访问一次 post.category 就会额外执行一次查询!总共 11 次
    print(f"《{post.title}》- {post.category.name}")

This is the classic N+1 problem - a large number of additional database queries are triggered in the loop, and the interface will become slower and slower.

from sqlalchemy.orm import selectinload, joinedload

# 方案1:joinedload,用 LEFT JOIN 一次性把文章和分类查出来
posts = Post.query.options(
    joinedload(Post.category)
).limit(10).all()

# 方案2:selectinload,先查文章,再用 IN 批量查分类(推荐)
posts = Post.query.options(
    selectinload(Post.category)
).limit(10).all()

# 此时循环内不会额外查询
for post in posts:
    print(f"《{post.title}》- {post.category.name}")

Recommendedselectinload: It is better thanjoinedloadMore stable, no result set expansion caused by complex JOIN.


4. Many-to-many relationship practice: article → tag

An article can have multiple tags, and a tag can also be used by multiple articles, which requires an intermediate table to record the relationship. If the intermediate table only stores two foreign keys (no additional data), you can directly usedb.Tabledefinition without creating a model class.

4.1 Model definition of pure relational table

# ------------------- 中间表:post_tags -------------------
post_tags = db.Table(
    "post_tags",
    db.Column("post_id", db.Integer, db.ForeignKey("posts.id", ondelete="CASCADE"),
              primary_key=True),
    db.Column("tag_id", db.Integer, db.ForeignKey("tags.id", ondelete="CASCADE"),
              primary_key=True),
    db.Column("created_at", db.DateTime, default=datetime.utcnow,
              index=True)  # 可选:记录关联时间
)

# ------------------- 标签 Tag -------------------
class Tag(db.Model):
    __tablename__ = "tags"
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(30), unique=True, nullable=False, index=True)
    slug = db.Column(db.String(30), unique=True, index=True)

    # 多对多反向引用,通过 secondary 指定中间表
    posts = db.relationship(
        "Post",
        secondary=post_tags,
        back_populates="tags",
        lazy="dynamic"
    )

    def __repr__(self):
        return f"<Tag {self.name}>"

Don't forget toPostAdd it to the modeltagsrelation:

# 在 Post 类中添加
tags = db.relationship(
    "Tag",
    secondary=post_tags,
    back_populates="posts",
    lazy="dynamic"
)

4.2 Common operations

# 给文章添加标签
post = Post.query.get(1)
python_tag = Tag.query.filter_by(name="Python").first()
flask_tag = Tag.query.filter_by(name="Flask").first()

post.tags.append(python_tag)       # 追加单个
post.tags.extend([flask_tag])      # 批量追加
db.session.commit()

# 移除某个标签
post.tags.remove(python_tag)

# 清空所有标签
post.tags = []

# 查询同时拥有「Python」和「Flask」标签的文章
from sqlalchemy import func

posts = (
    Post.query
    .join(Post.tags)
    .filter(Tag.name.in_(["Python", "Flask"]))
    .group_by(Post.id)
    .having(func.count(Tag.id) == 2)   # 确保两个标签都存在
    .all()
)

The essence of this kind of "multiple tags at the same time" query is to firstJOINIntermediate table, then group by articles, and finally useHAVINGFilter out records with the correct number of matches.


5. Self-referencing one-to-many: comment tree

The "building within a building" of comments is a classic self-referencing one-to-many: a comment can have multiple replies, and each reply is itself a comment. The way to achieve this is to add a foreign key pointing to your own primary key in the same table.

class Comment(db.Model):
    __tablename__ = "comments"
    id = db.Column(db.Integer, primary_key=True)
    content = db.Column(db.Text, nullable=False)
    post_id = db.Column(db.Integer, db.ForeignKey("posts.id"), index=True)
    author_id = db.Column(db.Integer, db.ForeignKey("users.id"), index=True)
    created_at = db.Column(db.DateTime, default=datetime.utcnow, index=True)

    # 自引用外键:指向父评论,顶级评论为 None
    parent_id = db.Column(db.Integer, db.ForeignKey("comments.id"),
                          nullable=True, index=True)

    # 反向引用:父评论的所有回复
    replies = db.relationship(
        "Comment",
        backref=db.backref("parent", remote_side=[id]),  # 指定本端 id 是“一”方
        lazy="dynamic",
        cascade="all, delete-orphan"
    )
  • remote_side=[id]Tells SQLAlchemy about the current modelidcolumn is the primary key of the "one" side, soparent_idIt is the foreign key of the "many" end.
  • cascade="all, delete-orphan"All child replies will be automatically deleted when the parent comment is deleted to avoid leaving orphan data.

passcomment.repliesAll sub-replies can be obtained,comment.parentYou can get the parent comment and easily build a tree comment structure.


6. Summary of best practices

  1. Explicit indexing: Foreign keys and common query fields (category name, title, time) must be addedindex=True
  2. Prevention N+1: Use decisively when querying related data in batchesselectinloadorjoinedload
  3. For large gatheringsdynamic: For reverse collections like "all articles under a category" that may become very large, be sure tolazyset to"dynamic"
  4. Cascade deletion should be enabled with caution:cascade="delete-orphan"Only use it when you need to "delete the parent as well as the child" to avoid accidentally deleting data.
  5. Build a pure intermediate table model: If the intermediate table only needs to store two foreign keys, usedb.TableMore lightweight; only upgrade to model when additional fields are needed (e.g. correlation time, weight).

🔗 Extended reading