实战项目:工业缺陷检测

引言

工业缺陷检测是智能制造和工业4.0的核心技术之一,它利用计算机视觉和深度学习技术自动识别产品缺陷,减少人为误差、降低成本、保障供应链安全。随着制造业自动化渗透率提升,基于AI的缺陷检测已成为现代高速生产线的刚需。本文聚焦于异常检测类缺陷场景(正常样本充足、缺陷样本稀缺),带你构建从传统到深度学习的完整系统。

📂 所属阶段:第二阶段 — 深度学习视觉基础(CNN 篇)
🔗 相关章节:实战项目一:智能人脸考勤系统 · 实战项目三:自动驾驶感知


1. 工业缺陷检测概述

1.1 核心价值

维度具体表现
质量控制替代人工重复检测,消除视觉疲劳导致的误判,标准统一、精度稳定、全量覆盖
成本效益降低长期质检人力投入,提前拦截缺陷减少后续返工/召回损失
安全品牌防止缺陷工业品流入市场,保障生产/用户安全,维护品牌长期信任

1.2 典型缺陷与检测挑战

常见缺陷分类

  • 表面缺陷:划痕、凹坑、污渍、裂纹、色差
  • 结构/尺寸缺陷:超差、变形、缺料、气泡、分层
  • 装配缺陷:错位、缺失、松动、焊点不良

核心检测难点

  1. 光照与视角不稳定:工厂环境光线不均、产品姿态变化大
  2. 缺陷与背景难区分:复杂纹理产品上的微小缺陷
  3. 样本极端不平衡:正常样本可能占比99.9%以上
  4. 实时性+高精度要求:高速生产线单帧检测需<10ms

2. 核心检测技术(异常检测为主)

2.1 传统机器学习异常检测

适合数据量小、产品纹理规则的场景,无需GPU也能部署。

核心思路

提取图像的统计/纹理/梯度特征,使用仅训练正常样本的无监督异常检测算法(如Isolation Forest、One-Class SVM)识别偏离分布的样本。

PyTorch以外的核心实现(skimage + sklearn)

import numpy as np
import cv2
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from skimage.feature import local_binary_pattern

class TraditionalAnomalyDetector:
    def __init__(self, method='isolation_forest', contamination=0.05, pca_dim=50):
        self.method = method
        self.contamination = contamination
        self.scaler = StandardScaler()
        self.pca = PCA(n_components=pca_dim)
        self.model = IsolationForest(contamination=contamination, random_state=42) \
            if method == 'isolation_forest' else None  # 可扩展One-Class SVM
    
    def extract_features(self, images):
        """合并LBP纹理、梯度、灰度统计特征"""
        features = []
        for img in images:
            gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY) if len(img.shape)==3 else img
            # 1. LBP纹理特征
            lbp = local_binary_pattern(gray, 24, 3, method='uniform')
            lbp_hist, _ = np.histogram(lbp.ravel(), bins=26, density=True)
            # 2. 梯度统计特征
            grad_x = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
            grad_y = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3)
            mag = np.sqrt(grad_x**2 + grad_y**2)
            grad_stats = [np.mean(mag), np.std(mag), np.percentile(mag, 25), np.percentile(mag, 75)]
            # 3. 灰度统计特征
            gray_stats = [np.mean(gray), np.std(gray), np.median(gray)]
            features.append(np.concatenate([lbp_hist, grad_stats, gray_stats]))
        return np.array(features)
    
    def fit(self, normal_images):
        """仅训练正常样本"""
        feats = self.extract_features(normal_images)
        feats = self.scaler.fit_transform(feats)
        feats = self.pca.fit_transform(feats)
        self.model.fit(feats)
    
    def predict(self, images):
        """返回(是否异常, 异常分数)"""
        feats = self.extract_features(images)
        feats = self.scaler.transform(feats)
        feats = self.pca.transform(feats)
        # 1: 正常, -1: 异常;分数越低越异常
        return self.model.predict(feats), -self.model.score_samples(feats)

2.2 深度学习异常检测(卷积自编码器)

适合数据量充足、产品纹理复杂的场景,是目前工业界的主流方案之一。

核心思路

用正常样本训练卷积自编码器(CAE),使模型学会“完美重建”正常图像;缺陷样本因不符合正常分布,重建误差会显著高于阈值

PyTorch完整实现

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import torchvision.transforms as T

class ConvAutoencoder(nn.Module):
    def __init__(self, in_channels=3):
        super().__init__()
        # 编码器:下采样+特征提取
        self.encoder = nn.Sequential(
            nn.Conv2d(in_channels, 32, 4, 2, 1), nn.ReLU(inplace=True),  # 224→112
            nn.Conv2d(32, 64, 4, 2, 1), nn.BatchNorm2d(64), nn.ReLU(inplace=True),  # 112→56
            nn.Conv2d(64, 128, 4, 2, 1), nn.BatchNorm2d(128), nn.ReLU(inplace=True),  # 56→28
            nn.Conv2d(128, 256, 4, 2, 1), nn.BatchNorm2d(256), nn.ReLU(inplace=True),  # 28→14
            nn.Conv2d(256, 512, 4, 2, 1), nn.ReLU(inplace=True)  # 14→7
        )
        # 解码器:上采样+重建
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(512, 256, 4, 2, 1), nn.BatchNorm2d(256), nn.ReLU(inplace=True),  # 7→14
            nn.ConvTranspose2d(256, 128, 4, 2, 1), nn.BatchNorm2d(128), nn.ReLU(inplace=True),  # 14→28
            nn.ConvTranspose2d(128, 64, 4, 2, 1), nn.BatchNorm2d(64), nn.ReLU(inplace=True),  # 28→56
            nn.ConvTranspose2d(64, 32, 4, 2, 1), nn.ReLU(inplace=True),  # 56→112
            nn.ConvTranspose2d(32, in_channels, 4, 2, 1), nn.Sigmoid()  # 112→224,输出[0,1]
        )
    
    def forward(self, x):
        z = self.encoder(x)
        return self.decoder(z)

class DeepAnomalyDetector:
    def __init__(self, img_size=(3,224,224), lr=1e-4, device=None):
        self.device = device or torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.model = ConvAutoencoder(img_size[0]).to(self.device)
        self.criterion = nn.MSELoss()
        self.optimizer = optim.Adam(self.model.parameters(), lr=lr)
        self.transform = T.Compose([
            T.ToPILImage(), T.Resize(img_size[1:]), T.ToTensor()
        ])
        self.threshold = None
    
    def fit(self, normal_imgs, epochs=50, batch_size=32):
        """预处理数据并训练CAE"""
        # 转换为PyTorch格式
        processed = [self.transform(img) for img in normal_imgs]
        loader = DataLoader(TensorDataset(torch.stack(processed)), batch_size=batch_size, shuffle=True)
        
        self.model.train()
        for epoch in range(epochs):
            total_loss = 0.0
            for (data,) in loader:
                data = data.to(self.device)
                # 前向+反向
                recon = self.model(data)
                loss = self.criterion(recon, data)
                self.optimizer.zero_grad()
                loss.backward()
                self.optimizer.step()
                total_loss += loss.item() * len(data)
            if (epoch+1) % 10 == 0:
                print(f"Epoch [{epoch+1}/{epochs}], Avg Loss: {total_loss/len(normal_imgs):.6f}")
    
    def set_threshold(self, normal_imgs, percentile=95):
        """用正常样本的重建误差百分位数设置阈值"""
        errors = []
        self.model.eval()
        with torch.no_grad():
            for img in normal_imgs:
                data = self.transform(img).unsqueeze(0).to(self.device)
                recon = self.model(data)
                errors.append(self.criterion(recon, data).item())
        self.threshold = np.percentile(errors, percentile)
        print(f"Set threshold to {self.threshold:.6f}")
    
    def predict(self, imgs):
        """返回(是否缺陷, 重建误差, 置信度)"""
        if not self.threshold: raise ValueError("Call set_threshold first!")
        results = []
        self.model.eval()
        with torch.no_grad():
            for img in imgs:
                data = self.transform(img).unsqueeze(0).to(self.device)
                recon = self.model(data)
                err = self.criterion(recon, data).item()
                results.append({
                    "is_defective": err > self.threshold,
                    "recon_error": err,
                    "confidence": min(err / self.threshold, 2.0)
                })
        return results

3. 工业级部署与优化

3.1 快速优化技巧

  1. 模型量化:用PyTorch的torch.quantization将FP32转为INT8,模型大小缩小4倍,推理速度提升2-3倍
  2. 输入尺寸调整:在不影响缺陷识别的前提下,将224×224缩小到128×128或96×96
  3. 使用OpenCV DNN:转换为ONNX后用OpenCV DNN推理,无需依赖PyTorch

3.2 简化的部署系统框架

import cv2
import sqlite3
from datetime import datetime

class IndustrialDefectSystem:
    def __init__(self, detector, db_path="defect_results.db"):
        self.detector = detector
        self.db_path = db_path
        self._init_db()
    
    def _init_db(self):
        """初始化SQLite结果数据库"""
        with sqlite3.connect(self.db_path) as conn:
            conn.execute('''
                CREATE TABLE IF NOT EXISTS results (
                    id INTEGER PRIMARY KEY AUTOINCREMENT,
                    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                    is_defective BOOLEAN,
                    recon_error REAL,
                    confidence REAL
                )
            ''')
    
    def detect_single(self, img):
        """检测单帧图像并保存结果"""
        res = self.detector.predict([img])[0]
        with sqlite3.connect(self.db_path) as conn:
            conn.execute('''
                INSERT INTO results (is_defective, recon_error, confidence)
                VALUES (?, ?, ?)
            ''', (res["is_defective"], res["recon_error"], res["confidence"]))
        return res
    
    def detect_video(self, source=0, show=True):
        """实时视频流检测(0为默认摄像头)"""
        cap = cv2.VideoCapture(source)
        if not cap.isOpened(): raise ValueError("Cannot open video source")
        
        while True:
            ret, frame = cap.read()
            if not ret: break
            # 检测并可视化
            res = self.detect_single(frame)
            color = (0,0,255) if res["is_defective"] else (0,255,0)
            text = f"DEFECT: {res['confidence']:.2f}" if res["is_defective"] else "OK"
            cv2.putText(frame, text, (30,30), cv2.FONT_HERSHEY_SIMPLEX, 1, color, 2)
            if show:
                cv2.imshow("Defect Detection", frame)
                if cv2.waitKey(1) & 0xFF == ord('q'): break
        cap.release()
        cv2.destroyAllWindows()

4. 学习建议与总结

学习建议

  1. 先上手传统方法:快速验证产品是否适合用视觉检测,理解异常检测的核心逻辑
  2. CAE优先:工业界最稳定、最易部署的深度学习异常检测方案
  3. 重视数据质量:收集多样化的正常样本(不同光照、视角、批次),少量缺陷样本用于验证阈值
  4. 关注鲁棒性:实际部署前一定要做极端场景测试(强光、弱光、轻微遮挡)

总结

工业缺陷检测的核心是“用最少的缺陷样本,解决最实际的生产问题”。本文介绍的传统方法和CAE方案,已经能覆盖80%以上的工业异常检测场景。后续可以根据需求学习VAE、GAN、PatchCore等更先进的方法。

工业部署中,**稳定性 > 准确率**。不要盲目追求SOTA模型,优先选择结构简单、推理时间可控的方案!