YOLO family practice: a complete guide from YOLOv1 to YOLOv8

Introduction

YOLO (You Only Look Once) is the most influential series of algorithms in the field of object detection, known for its excellent balance of speed and accuracy. From YOLOv1 in 2015 to YOLOv8 in 2023, the YOLO family has continued to evolve and become the most commonly used real-time target detection solution in the industry. This article will introduce in detail the development history, core principles and practical applications of the YOLO family.

📂 Stage: Stage 2 - Deep Learning Vision Basics (CNN) 🔗 Related chapters: 目标检测理论 · 语义分割 (Semantic Segmentation)

1. YOLO family development history

1.1 The birth and development of YOLO

The introduction of the YOLO algorithm marks an important turning point in the field of target detection, from traditional two-stage detection to one-stage detection.

Version	Year	Core Improvements
YOLOv1	2015	Proposed the concept of single-stage detection for the first time, transforming the detection problem into a regression problem
YOLOv2	2016	Introducing Batch Normalization, Anchor Boxes, and multi-scale training
YOLOv3	2018	Multi-scale prediction, better feature extraction network, improved small target detection
YOLOv4	2020	CSPDarknet53 backbone network, PANet feature fusion, Mosaic data enhancement
YOLOv5	2020	PyTorch implementation, easier-to-use interface, rich pre-training models
YOLOv6	2022	RepVGG structure, more efficient architecture
YOLOv7	2022	Gradient path planning, model scaling strategy
YOLOv8	2023	Anchor-less design, more advanced backbone network, instance split support

In order to understand the positioning of each version more intuitively, we can use a piece of code to summarize their characteristics:

def yolov_evolution():
    """
    YOLO版本演进特点
    """
    evolution = {
        "YOLOv1": "单阶段检测开创者",
        "YOLOv2": "引入Anchor和BN",
        "YOLOv3": "多尺度预测",
        "YOLOv4": "最优速度精度平衡",
        "YOLOv5": "PyTorch易用性",
        "YOLOv6": "高效架构设计",
        "YOLOv7": "训练优化创新",
        "YOLOv8": "无Anchor先进设计"
    }
    
    print("YOLO家族演进特点:")
    for version, feature in evolution.items():
        print(f"• {version}: {feature}")

yolov_evolution()

1.2 YOLO’s core philosophy

YOLO's success stems from its unique design concept, which is mainly reflected in the following aspects:

Unified Framework: Unify classification and positioning into a single neural network to achieve end-to-end training and inference
Global view: View the entire image at once, avoiding the region proposal stage of the R-CNN series
Speed Advantage: Real-time detection capability and efficient network architecture

We can understand these core concepts through the following code:

def yolo_core_concepts():
    """
    YOLO核心概念解释
    """
    concepts = {
        "Unified Detection": "单网络同时预测类别和位置",
        "Grid-based Prediction": "将图像划分为网格进行预测", 
        "Real-time Performance": "满足实时应用需求",
        "End-to-End Training": "无需复杂的多阶段训练"
    }
    
    print("YOLO核心概念:")
    for concept, desc in concepts.items():
        print(f"• {concept}: {desc}")

yolo_core_concepts()

2. In-depth analysis of YOLOv5

2.1 YOLOv5 architecture features

YOLOv5 is a PyTorch implementation developed by Ultralytics, which has excellent ease of use and performance. Its architecture mainly includes:

Backbone：CSPDarknet53
Neck：PANet (Path Aggregation Network)
Head: detection head

YOLOv5 provides multiple model variants to adapt to different application scenarios:

Model	Number of parameters	GFLOPS	CPU speed	mAP
YOLOv5n	1.9M	4.5	6.3 ms	28.0%
YOLOv5s	7.2M	16.5	2.0 ms	37.4%
YOLOv5m	21.2M	49.0	3.0 ms	45.4%
YOLOv5l	46.5M	109.1	4.0 ms	49.0%
YOLOv5x	86.7M	205.7	6.1 ms	50.7%

2.2 YOLOv5 installation and configuration

Installing YOLOv5 is very simple, just follow the steps below:

# 1. 克隆仓库
git clone https://github.com/ultralytics/yolov5
cd yolov5

# 2. 安装依赖
pip install -r requirements.txt

# 3. 验证安装
python detect.py --weights yolov5s.pt --source 0  # webcam
# 或者
python detect.py --weights yolov5s.pt --source data/images

2.3 YOLOv5 reasoning implementation

YOLOv5 provides a variety of reasoning methods. The following are three commonly used methods:

Method 1: Use official interface

import yolov5

# 加载模型
model = yolov5.load('yolov5s.pt')  # 会自动下载模型

# 推理单张图片
results = model('image.jpg')

# 显示结果
results.show()

# 保存结果
results.save(save_dir='runs/detect/exp')

Method 2: Use torch hub

import torch

# 从torch hub加载
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# 推理
results = model('image.jpg')

# 或者推理视频
results = model('video.mp4')

# 或者推理摄像头
results = model(0)  # 0表示默认摄像头

3. YOLOv8 in-depth analysis

3.1 YOLOv8 new features

YOLOv8 is the latest version released by Ultralytics in 2023, bringing many innovations:

Features	YOLOv5	YOLOv8
Architecture	CSPDarknet53 + PANet	Improved backbone network + better Neck
Anchors	Using Anchor Boxes	Designing without Anchors
Task	Mainly target detection	Detection + Segmentation + Pose Estimation
API	Relatively complex	More concise and unified

3.2 YOLOv8 installation and use

The installation of YOLOv8 is simpler, just one command:

pip install ultralytics

Basic usage example:

from ultralytics import YOLO

# 加载模型
model = YOLO('yolov8n.pt')  # nano版本

# 推理
results = model('image.jpg')

# 显示结果
for r in results:
    print(r.boxes)  # 边界框
    print(r.masks)  # 分割掩码（如果支持）
    print(r.keypoints)  # 关键点（如果支持）

4. Data preparation and format

4.1 YOLO data format

YOLO uses a specific data format for training, and understanding the format is important for customized training.

Directory structure:

dataset/
├── images/
│   ├── train/
│   ├── val/
│   └── test/
└── labels/
    ├── train/
    ├── val/
    └── test/

Annotation file format: One object per line, in the format:class_id center_x center_y width height, the coordinates are all normalized values [0, 1]

0 0.5 0.5 0.3 0.4  # 类别0，位于图像中心，占30%宽40%高
1 0.2 0.3 0.1 0.1  # 类别1，位于左上角附近

Data configuration file (data.yaml):

path: ../datasets/coco8  # 数据集根目录
train: images/train  # 训练图像目录
val: images/val  # 验证图像目录
test:  # 测试图像目录（可选）

# 类别
nc: 80  # 类别数量
names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', ...]  # 类别名称

4.2 Data preprocessing

Data preprocessing best practices:

Standardize image size (e.g. 640x640)
Data enhancement (Mosaic, MixUp, etc.)
Annotation verification (check bounding box validity)
Category balancing (handling category imbalance)
Data partitioning (training/validation/testing)

5. Model training

5.1 YOLOv5 training

Command line training:

# 基本训练
python train.py --img 640 --batch 16 --epochs 100 --data coco128.yaml --weights yolov5s.pt --device 0

Python API training:

import yolov5

# 加载模型
model = yolov5.train(
    'yolov5s.pt',  # 预训练模型
    imgsz=640,     # 图像尺寸
    batch_size=16, # 批次大小
    epochs=100,    # 训练轮数
    data='data.yaml',  # 数据配置
    device='0',    # 训练设备
    workers=8,     # 数据加载进程数
    project='runs/train',  # 保存目录
    name='exp'     # 实验名称
)

5.2 YOLOv8 training

from ultralytics import YOLO

# 加载预训练模型
model = YOLO('yolov8n.pt')

# 训练模型
results = model.train(
    data='data.yaml',    # 数据配置文件
    epochs=100,         # 训练轮数
    imgsz=640,          # 输入图像尺寸
    batch=16,           # 批次大小
    device='0',         # 训练设备
    project='runs/train', # 保存目录
    name='my_experiment'  # 实验名称
)

5.3 Training optimization techniques

Use pre-trained weights to accelerate convergence
Set the learning rate scheduling strategy appropriately
Enable data augmentation to improve generalization capabilities
Use mixed precision training to save video memory
Adjust batch size to balance speed and performance
Monitor the training process to avoid overfitting
Save checkpoints regularly for easy recovery

6. Model inference and deployment

6.1 Processing of inference results

# YOLOv8结果处理
from ultralytics import YOLO
import cv2

model = YOLO('yolov8n.pt')
results = model('image.jpg')

for r in results:
    # 获取边界框
    boxes = r.boxes  # Boxes object for bbox outputs
    masks = r.masks  # Masks object for segmentation masks
    probs = r.probs  # Class probabilities for classification outputs
    
    # 处理边界框
    if boxes is not None:
        xyxy = boxes.xyxy.cpu().numpy()  # 边界框坐标
        conf = boxes.conf.cpu().numpy()  # 置信度
        cls = boxes.cls.cpu().numpy()    # 类别
        
        for i in range(len(xyxy)):
            x1, y1, x2, y2 = xyxy[i]
            confidence = conf[i]
            class_id = int(cls[i])
            
            print(f'检测到类别 {class_id}, 置信度 {confidence:.2f}, 位置 ({x1}, {y1}, {x2}, {y2})')

6.2 Model deployment options

Deployment method	Description
ONNX	Convert to ONNX format, cross-platform deployment
TensorRT	NVIDIA TensorRT optimization, GPU acceleration
OpenVINO	Intel OpenVINO toolkit, CPU optimization
Core ML	Apple Core ML framework, iOS/macOS deployment
TFLite	TensorFlow Lite, mobile deployment
Edge TPU	Google Edge TPU, edge device acceleration

6.3 Performance optimization

Choose the appropriate model size (nano/small/medium/large/xlarge)
Use model quantization to reduce model size and inference time
Enable inference optimization libraries such as TensorRT or OpenVINO
Adjust input image size to balance accuracy and speed
Use batch processing to improve throughput
Optimize the data loading pipeline to reduce I/O bottlenecks

7. Practical application cases

7.1 Custom data set training

Custom data set training steps:

Prepare image data and annotations
Convert the annotation format to YOLO format
Create data configuration file
Verify the correctness of data format
Choose an appropriate pre-trained model
Configure training parameters
Start the training process
Monitor training metrics
Evaluate model performance
Tuning and retraining

7.2 Real-time detection application

import cv2
from ultralytics import YOLO

# 加载模型
model = YOLO('yolov8n.pt')

# 打开摄像头
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break
    
    # 推理
    results = model(frame)
    
    # 在帧上绘制结果
    annotated_frame = results[0].plot()
    
    # 显示结果
    cv2.imshow('YOLOv8 Detection', annotated_frame)
    
    # 按'q'退出
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

YOLO is one of the most popular object detection frameworks currently. It is recommended to start learning from YOLOv8 because it has a more modern architecture and simpler API. Mastering data formats, training processes, and inference methods is key.

8. Summary

The YOLO family represents important progress in the field of object detection:

Development History:

YOLOv1-v3: Lays the foundation for single-stage detection
YOLOv4-v5: greatly improved performance and ease of use
YOLOv6-v8: More advanced architecture design

Core advantages:

Real-time detection capability
High-precision performance
Easy to deploy
Rich model variants

💡 Important reminder: YOLO has become the standard choice for target detection in the industry. Mastering the use of YOLO series models is an essential skill for computer vision engineers.

🔗 Extended reading

#YOLO family practice: a complete guide from YOLOv1 to YOLOv8

#Introduction

#1. YOLO family development history

#1.1 The birth and development of YOLO

#1.2 YOLO’s core philosophy

#2. In-depth analysis of YOLOv5

#2.1 YOLOv5 architecture features

#2.2 YOLOv5 installation and configuration

#2.3 YOLOv5 reasoning implementation

#Method 1: Use official interface

#Method 2: Use torch hub

#3. YOLOv8 in-depth analysis

#3.1 YOLOv8 new features

#3.2 YOLOv8 installation and use

#4. Data preparation and format

#4.1 YOLO data format

#4.2 Data preprocessing

#5. Model training

#5.1 YOLOv5 training

#5.2 YOLOv8 training

#5.3 Training optimization techniques

#6. Model inference and deployment

#6.1 Processing of inference results

#6.2 Model deployment options

#6.3 Performance optimization

#7. Practical application cases

#7.1 Custom data set training

#7.2 Real-time detection application

#Related tutorials

#8. Summary