YOLO family practice: a complete guide from YOLOv1 to YOLOv8

Introduction

YOLO (You Only Look Once) is the most influential series of algorithms in the field of object detection, known for its excellent balance of speed and accuracy. From YOLOv1 in 2015 to YOLOv8 in 2023, the YOLO family has continued to evolve and become the most commonly used real-time target detection solution in the industry. This article will introduce in detail the development history, core principles and practical applications of the YOLO family.

📂 Stage: Stage 2 - Deep Learning Vision Basics (CNN) 🔗 Related chapters: 目标检测理论 · 语义分割 (Semantic Segmentation)


1. YOLO family development history

1.1 The birth and development of YOLO

The introduction of the YOLO algorithm marks an important turning point in the field of target detection, from traditional two-stage detection to one-stage detection.

VersionYearCore Improvements
YOLOv12015Proposed the concept of single-stage detection for the first time, transforming the detection problem into a regression problem
YOLOv22016Introducing Batch Normalization, Anchor Boxes, and multi-scale training
YOLOv32018Multi-scale prediction, better feature extraction network, improved small target detection
YOLOv42020CSPDarknet53 backbone network, PANet feature fusion, Mosaic data enhancement
YOLOv52020PyTorch implementation, easier-to-use interface, rich pre-training models
YOLOv62022RepVGG structure, more efficient architecture
YOLOv72022Gradient path planning, model scaling strategy
YOLOv82023Anchor-less design, more advanced backbone network, instance split support

In order to understand the positioning of each version more intuitively, we can use a piece of code to summarize their characteristics:

def yolov_evolution():
    """
    YOLO版本演进特点
    """
    evolution = {
        "YOLOv1": "单阶段检测开创者",
        "YOLOv2": "引入Anchor和BN",
        "YOLOv3": "多尺度预测",
        "YOLOv4": "最优速度精度平衡",
        "YOLOv5": "PyTorch易用性",
        "YOLOv6": "高效架构设计",
        "YOLOv7": "训练优化创新",
        "YOLOv8": "无Anchor先进设计"
    }
    
    print("YOLO家族演进特点:")
    for version, feature in evolution.items():
        print(f"• {version}: {feature}")

yolov_evolution()

1.2 YOLO’s core philosophy

YOLO's success stems from its unique design concept, which is mainly reflected in the following aspects:

  1. Unified Framework: Unify classification and positioning into a single neural network to achieve end-to-end training and inference
  2. Global view: View the entire image at once, avoiding the region proposal stage of the R-CNN series
  3. Speed ​​Advantage: Real-time detection capability and efficient network architecture

We can understand these core concepts through the following code:

def yolo_core_concepts():
    """
    YOLO核心概念解释
    """
    concepts = {
        "Unified Detection": "单网络同时预测类别和位置",
        "Grid-based Prediction": "将图像划分为网格进行预测", 
        "Real-time Performance": "满足实时应用需求",
        "End-to-End Training": "无需复杂的多阶段训练"
    }
    
    print("YOLO核心概念:")
    for concept, desc in concepts.items():
        print(f"• {concept}: {desc}")

yolo_core_concepts()

2. In-depth analysis of YOLOv5

2.1 YOLOv5 architecture features

YOLOv5 is a PyTorch implementation developed by Ultralytics, which has excellent ease of use and performance. Its architecture mainly includes:

  • Backbone:CSPDarknet53
  • Neck:PANet (Path Aggregation Network)
  • Head: detection head

YOLOv5 provides multiple model variants to adapt to different application scenarios:

ModelNumber of parametersGFLOPSCPU speedmAP
YOLOv5n1.9M4.56.3 ms28.0%
YOLOv5s7.2M16.52.0 ms37.4%
YOLOv5m21.2M49.03.0 ms45.4%
YOLOv5l46.5M109.14.0 ms49.0%
YOLOv5x86.7M205.76.1 ms50.7%

2.2 YOLOv5 installation and configuration

Installing YOLOv5 is very simple, just follow the steps below:

# 1. 克隆仓库
git clone https://github.com/ultralytics/yolov5
cd yolov5

# 2. 安装依赖
pip install -r requirements.txt

# 3. 验证安装
python detect.py --weights yolov5s.pt --source 0  # webcam
# 或者
python detect.py --weights yolov5s.pt --source data/images

2.3 YOLOv5 reasoning implementation

YOLOv5 provides a variety of reasoning methods. The following are three commonly used methods:

Method 1: Use official interface

import yolov5

# 加载模型
model = yolov5.load('yolov5s.pt')  # 会自动下载模型

# 推理单张图片
results = model('image.jpg')

# 显示结果
results.show()

# 保存结果
results.save(save_dir='runs/detect/exp')

Method 2: Use torch hub

import torch

# 从torch hub加载
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# 推理
results = model('image.jpg')

# 或者推理视频
results = model('video.mp4')

# 或者推理摄像头
results = model(0)  # 0表示默认摄像头

3. YOLOv8 in-depth analysis

3.1 YOLOv8 new features

YOLOv8 is the latest version released by Ultralytics in 2023, bringing many innovations:

FeaturesYOLOv5YOLOv8
ArchitectureCSPDarknet53 + PANetImproved backbone network + better Neck
AnchorsUsing Anchor BoxesDesigning without Anchors
TaskMainly target detectionDetection + Segmentation + Pose Estimation
APIRelatively complexMore concise and unified

3.2 YOLOv8 installation and use

The installation of YOLOv8 is simpler, just one command:

pip install ultralytics

Basic usage example:

from ultralytics import YOLO

# 加载模型
model = YOLO('yolov8n.pt')  # nano版本

# 推理
results = model('image.jpg')

# 显示结果
for r in results:
    print(r.boxes)  # 边界框
    print(r.masks)  # 分割掩码(如果支持)
    print(r.keypoints)  # 关键点(如果支持)

4. Data preparation and format

4.1 YOLO data format

YOLO uses a specific data format for training, and understanding the format is important for customized training.

Directory structure:

dataset/
├── images/
│   ├── train/
│   ├── val/
│   └── test/
└── labels/
    ├── train/
    ├── val/
    └── test/

Annotation file format: One object per line, in the format:class_id center_x center_y width height, the coordinates are all normalized values ​​[0, 1]

0 0.5 0.5 0.3 0.4  # 类别0,位于图像中心,占30%宽40%高
1 0.2 0.3 0.1 0.1  # 类别1,位于左上角附近

Data configuration file (data.yaml):

path: ../datasets/coco8  # 数据集根目录
train: images/train  # 训练图像目录
val: images/val  # 验证图像目录
test:  # 测试图像目录(可选)

# 类别
nc: 80  # 类别数量
names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', ...]  # 类别名称

4.2 Data preprocessing

Data preprocessing best practices:

  1. Standardize image size (e.g. 640x640)
  2. Data enhancement (Mosaic, MixUp, etc.)
  3. Annotation verification (check bounding box validity)
  4. Category balancing (handling category imbalance)
  5. Data partitioning (training/validation/testing)

5. Model training

5.1 YOLOv5 training

Command line training:

# 基本训练
python train.py --img 640 --batch 16 --epochs 100 --data coco128.yaml --weights yolov5s.pt --device 0

Python API training:

import yolov5

# 加载模型
model = yolov5.train(
    'yolov5s.pt',  # 预训练模型
    imgsz=640,     # 图像尺寸
    batch_size=16, # 批次大小
    epochs=100,    # 训练轮数
    data='data.yaml',  # 数据配置
    device='0',    # 训练设备
    workers=8,     # 数据加载进程数
    project='runs/train',  # 保存目录
    name='exp'     # 实验名称
)

5.2 YOLOv8 training

from ultralytics import YOLO

# 加载预训练模型
model = YOLO('yolov8n.pt')

# 训练模型
results = model.train(
    data='data.yaml',    # 数据配置文件
    epochs=100,         # 训练轮数
    imgsz=640,          # 输入图像尺寸
    batch=16,           # 批次大小
    device='0',         # 训练设备
    project='runs/train', # 保存目录
    name='my_experiment'  # 实验名称
)

5.3 Training optimization techniques

  1. Use pre-trained weights to accelerate convergence
  2. Set the learning rate scheduling strategy appropriately
  3. Enable data augmentation to improve generalization capabilities
  4. Use mixed precision training to save video memory
  5. Adjust batch size to balance speed and performance
  6. Monitor the training process to avoid overfitting
  7. Save checkpoints regularly for easy recovery

6. Model inference and deployment

6.1 Processing of inference results

# YOLOv8结果处理
from ultralytics import YOLO
import cv2

model = YOLO('yolov8n.pt')
results = model('image.jpg')

for r in results:
    # 获取边界框
    boxes = r.boxes  # Boxes object for bbox outputs
    masks = r.masks  # Masks object for segmentation masks
    probs = r.probs  # Class probabilities for classification outputs
    
    # 处理边界框
    if boxes is not None:
        xyxy = boxes.xyxy.cpu().numpy()  # 边界框坐标
        conf = boxes.conf.cpu().numpy()  # 置信度
        cls = boxes.cls.cpu().numpy()    # 类别
        
        for i in range(len(xyxy)):
            x1, y1, x2, y2 = xyxy[i]
            confidence = conf[i]
            class_id = int(cls[i])
            
            print(f'检测到类别 {class_id}, 置信度 {confidence:.2f}, 位置 ({x1}, {y1}, {x2}, {y2})')

6.2 Model deployment options

Deployment methodDescription
ONNXConvert to ONNX format, cross-platform deployment
TensorRTNVIDIA TensorRT optimization, GPU acceleration
OpenVINOIntel OpenVINO toolkit, CPU optimization
Core MLApple Core ML framework, iOS/macOS deployment
TFLiteTensorFlow Lite, mobile deployment
Edge TPUGoogle Edge TPU, edge device acceleration

6.3 Performance optimization

  1. Choose the appropriate model size (nano/small/medium/large/xlarge)
  2. Use model quantization to reduce model size and inference time
  3. Enable inference optimization libraries such as TensorRT or OpenVINO
  4. Adjust input image size to balance accuracy and speed
  5. Use batch processing to improve throughput
  6. Optimize the data loading pipeline to reduce I/O bottlenecks

7. Practical application cases

7.1 Custom data set training

Custom data set training steps:

  1. Prepare image data and annotations
  2. Convert the annotation format to YOLO format
  3. Create data configuration file
  4. Verify the correctness of data format
  5. Choose an appropriate pre-trained model
  6. Configure training parameters
  7. Start the training process
  8. Monitor training metrics
  9. Evaluate model performance
  10. Tuning and retraining

7.2 Real-time detection application

import cv2
from ultralytics import YOLO

# 加载模型
model = YOLO('yolov8n.pt')

# 打开摄像头
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break
    
    # 推理
    results = model(frame)
    
    # 在帧上绘制结果
    annotated_frame = results[0].plot()
    
    # 显示结果
    cv2.imshow('YOLOv8 Detection', annotated_frame)
    
    # 按'q'退出
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
YOLO is one of the most popular object detection frameworks currently. It is recommended to start learning from YOLOv8 because it has a more modern architecture and simpler API. Mastering data formats, training processes, and inference methods is key.

8. Summary

The YOLO family represents important progress in the field of object detection:

Development History:

  1. YOLOv1-v3: Lays the foundation for single-stage detection
  2. YOLOv4-v5: greatly improved performance and ease of use
  3. YOLOv6-v8: More advanced architecture design

Core advantages:

  • Real-time detection capability
  • High-precision performance
  • Easy to deploy
  • Rich model variants

💡 Important reminder: YOLO has become the standard choice for target detection in the industry. Mastering the use of YOLO series models is an essential skill for computer vision engineers.

🔗 Extended reading