Practical project: industrial defect detection

Introduction

Have you ever wondered how products such as mobile phone screens, car parts, and pills are inspected one by one for defects before leaving the factory? By human eyes? Looking at tens of thousands of parts in a day will inevitably make your eyes dazzled. Not to mention, some defects are thinner than a hair.

Industrial defect detection is to use machines instead of human eyes to "find faults" 24 hours a day. What’s behind it is not magic, but computer vision and deep learning. This article is a "trouble-finding guide" prepared for developers, focusing on solving a particularly common problem: There are too many normal samples to use up, but there are very few defective samples. This scenario is called anomaly detection in the industry.

We will start from traditional methods and talk about deep learning solutions such as convolutional autoencoders. We will also give PyTorch code and deployment ideas that can be run directly. Whether you are just getting started or are planning to run the model on the production line, I hope this article can help you.

📂 Stage: Stage 2 - Deep Learning Vision Basics (CNN) 🔗 Related chapters: 实战项目一:智能人脸考勤系统 · 实战项目三:自动驾驶感知


1. What is industrial defect detection?

Simply put, it uses a camera to capture an image of the product, and then uses an algorithm to automatically determine whether it is qualified. Compared with manual inspection, the machine is not tiring, has uniform standards, and can leave complete data records.

1.1 Why does the factory need it?

  • More stable quality: Human eyes will have differences in fatigue, emotions, and experience, but machines will not. Once the algorithm is determined, all products will be treated equally.
  • Save money and time: Investing a development cost in the early stage can save a lot of quality inspection manpower in the later stage; more importantly, it can intercept defective products at an early stage to avoid greater losses caused by subsequent rework or recalls.
  • Guard safety and brand: A defective screw may destroy a piece of equipment, and a batch of products with poor appearance may damage the brand that has been in operation for many years.

1.2 What do common defects look like?

On the production line, there are many kinds of defects, which can be roughly classified into the following categories:

  • Surface Defects: Scratches, dents, stains, cracks, uneven color. Such as small scratches on the glass of mobile phones.
  • Structural/Dimensional Defects: Out of tolerance dimensions, deformation, missing material, internal bubbles, material delamination.
  • Assembly defects: Parts are installed in the wrong position, screws are missing, and solder joints are weakly welded.

The most troublesome issues during detection

  1. Light and angle are always changing: The workshop environment is not as stable as the laboratory. If the brightness changes within a day and the angle of product placement is slightly different, the image may vary greatly.
  2. The defect is too small and too similar to the background: For example, there is a small crack on the floor with wood grain, which is difficult for the naked eye and more difficult for the machine to distinguish.
  3. Too few "defective products": On a stable production line, 99.9% of the products may be normal products, and only a few defective samples can be collected in a month. This makes it difficult to train the traditional classification model that “learns from a large number of defect samples”.
  4. Both speed and accuracy must be achieved: Several products pass through a high-speed assembly line in one second, and the processing time of a picture is only a few milliseconds, and the false alarm rate must be kept low.

These difficulties determine that we cannot use the ordinary "cat and dog classification" idea to solve the problem, but must use the anomaly detection method.


2. Core technology: How to use anomaly detection?

The core logic is actually not complicated: let the model only learn "what a normal product looks like", and then anything that doesn't look right is classified as an anomaly. ** Just like we have only seen whole apples, and suddenly a bug-eyed one appears, we can immediately realize that it is abnormal.

Depending on the amount of data on hand and the complexity of the product images, there are usually two approaches.

2.1 Traditional method: when there is not much data and simple textures

If the texture of your product is very regular (such as solid-color metal sheets, cloth with simple patterns), and there are only a few hundred or thousands of normal samples, it is more labor-saving to use traditional machine learning, and it does not even require a GPU.

How to do it?

The whole process is divided into three steps:

  1. Feature extraction: Convert each image into a string of numbers that can describe the "normal appearance", such as texture uniformity, color distribution, edge gradient, etc.
  2. Dimensionality reduction and standardization: Squeeze the feature dimensions, remove redundant information, and scale to the same scale.
  3. Train anomaly detection model: Use algorithms such as "isolated forest" or "single-class support vector machine" to circle a "normal area" in the feature space of normal samples. If a new sample falls outside the area, it is an anomaly.

Here is a ready-to-use Python implementation, usingscikit-imageandscikit-learn

import numpy as np
import cv2
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from skimage.feature import local_binary_pattern

class TraditionalAnomalyDetector:
    def __init__(self, method='isolation_forest', contamination=0.05, pca_dim=50):
        self.method = method
        self.contamination = contamination
        self.scaler = StandardScaler()
        self.pca = PCA(n_components=pca_dim)
        # 默认使用孤立森林,你也可以换成 OneClassSVM
        self.model = IsolationForest(contamination=contamination, random_state=42) \
            if method == 'isolation_forest' else None
    
    def extract_features(self, images):
        """从每张图中提取 LBP 纹理、梯度统计和灰度统计特征"""
        features = []
        for img in images:
            # 统一转成单通道灰度图
            gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY) if len(img.shape)==3 else img
            
            # 1. LBP 纹理特征:刻画局部纹理模式
            lbp = local_binary_pattern(gray, 24, 3, method='uniform')
            lbp_hist, _ = np.histogram(lbp.ravel(), bins=26, density=True)
            
            # 2. 梯度统计特征:捕捉边缘和轮廓变化
            grad_x = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
            grad_y = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3)
            mag = np.sqrt(grad_x**2 + grad_y**2)
            grad_stats = [np.mean(mag), np.std(mag), np.percentile(mag, 25), np.percentile(mag, 75)]
            
            # 3. 灰度统计特征:亮暗分布
            gray_stats = [np.mean(gray), np.std(gray), np.median(gray)]
            
            features.append(np.concatenate([lbp_hist, grad_stats, gray_stats]))
        return np.array(features)
    
    def fit(self, normal_images):
        """只用正常样本来训练"""
        feats = self.extract_features(normal_images)
        feats = self.scaler.fit_transform(feats)    # 标准化
        feats = self.pca.fit_transform(feats)       # 降维
        self.model.fit(feats)                       # 学习正常边界
    
    def predict(self, images):
        """返回 (预测标签, 异常分数),标签 -1 表示异常,1 表示正常"""
        feats = self.extract_features(images)
        feats = self.scaler.transform(feats)
        feats = self.pca.transform(feats)
        # 孤立森林中,-1 是异常,分数越低越可能异常
        return self.model.predict(feats), -self.model.score_samples(feats)

💡 **When to use it? ** When you only have a CPU, the data volume is within a few thousand, and the product texture is not complex, this traditional solution is extremely cost-effective. You can even deploy it directly using packaging tools without writing the dependencies of the deep learning framework.

2.2 Deep learning method: when there is a lot of data and complex textures

If the product surface itself has complex patterns (such as cloth surface, printed packaging), it is difficult for traditional artificial design features to cover all "normal" changes. At this time, the convolutional autoencoder (CAE) is needed.

Why is it useful?

The autoencoder is like a "memory master" and consists of two parts:

  • Encoder: Compress the input image step by step into a condensed feature vector (such as only remembering key information).
  • Decoder: Then "restore" the image from this condensed feature.

If you only use normal products to train it, then the decoder will only learn "how to reconstruct what normal products look like." When a defective sample is given in, the decoder will still try to restore it to its normal appearance. As a result, the reconstructed image is very different from the original image. We only need to calculate this difference (reconstruction error) to determine whether there is a defect.

Implemented from scratch using PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import torchvision.transforms as T

class ConvAutoencoder(nn.Module):
    def __init__(self, in_channels=3):
        super().__init__()
        # 编码器:连续下采样,提取高层特征
        self.encoder = nn.Sequential(
            nn.Conv2d(in_channels, 32, 4, 2, 1), nn.ReLU(inplace=True),     # 224→112
            nn.Conv2d(32, 64, 4, 2, 1), nn.BatchNorm2d(64), nn.ReLU(inplace=True),  # 112→56
            nn.Conv2d(64, 128, 4, 2, 1), nn.BatchNorm2d(128), nn.ReLU(inplace=True),# 56→28
            nn.Conv2d(128, 256, 4, 2, 1), nn.BatchNorm2d(256), nn.ReLU(inplace=True),# 28→14
            nn.Conv2d(256, 512, 4, 2, 1), nn.ReLU(inplace=True)             # 14→7
        )
        # 解码器:连续上采样,恢复图像
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(512, 256, 4, 2, 1), nn.BatchNorm2d(256), nn.ReLU(inplace=True),  # 7→14
            nn.ConvTranspose2d(256, 128, 4, 2, 1), nn.BatchNorm2d(128), nn.ReLU(inplace=True),  # 14→28
            nn.ConvTranspose2d(128, 64, 4, 2, 1), nn.BatchNorm2d(64), nn.ReLU(inplace=True),    # 28→56
            nn.ConvTranspose2d(64, 32, 4, 2, 1), nn.ReLU(inplace=True),                         # 56→112
            nn.ConvTranspose2d(32, in_channels, 4, 2, 1), nn.Sigmoid()                          # 112→224,输出像素值[0,1]
        )
    
    def forward(self, x):
        z = self.encoder(x)
        return self.decoder(z)

class DeepAnomalyDetector:
    def __init__(self, img_size=(3,224,224), lr=1e-4, device=None):
        self.device = device or torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.model = ConvAutoencoder(img_size[0]).to(self.device)
        self.criterion = nn.MSELoss()
        self.optimizer = optim.Adam(self.model.parameters(), lr=lr)
        self.transform = T.Compose([
            T.ToPILImage(), T.Resize(img_size[1:]), T.ToTensor()
        ])
        self.threshold = None
    
    def fit(self, normal_imgs, epochs=50, batch_size=32):
        """用正常样本训练自编码器"""
        processed = [self.transform(img) for img in normal_imgs]
        loader = DataLoader(TensorDataset(torch.stack(processed)), batch_size=batch_size, shuffle=True)
        
        self.model.train()
        for epoch in range(epochs):
            total_loss = 0.0
            for (data,) in loader:
                data = data.to(self.device)
                recon = self.model(data)
                loss = self.criterion(recon, data)         # 让重建结果与原始输入尽量一致
                self.optimizer.zero_grad()
                loss.backward()
                self.optimizer.step()
                total_loss += loss.item() * len(data)
            if (epoch+1) % 10 == 0:
                print(f"Epoch [{epoch+1}/{epochs}], Avg Loss: {total_loss/len(normal_imgs):.6f}")
    
    def set_threshold(self, normal_imgs, percentile=95):
        """基于正常样本的重建误差,设定异常判定的阈值"""
        errors = []
        self.model.eval()
        with torch.no_grad():
            for img in normal_imgs:
                data = self.transform(img).unsqueeze(0).to(self.device)
                recon = self.model(data)
                errors.append(self.criterion(recon, data).item())
        self.threshold = np.percentile(errors, percentile)
        print(f"Set threshold to {self.threshold:.6f}")
    
    def predict(self, imgs):
        """返回每张图片的检测结果:是否缺陷、重建误差、置信度"""
        if self.threshold is None:
            raise ValueError("请先调用 set_threshold 方法设定阈值!")
        results = []
        self.model.eval()
        with torch.no_grad():
            for img in imgs:
                data = self.transform(img).unsqueeze(0).to(self.device)
                recon = self.model(data)
                err = self.criterion(recon, data).item()
                results.append({
                    "is_defective": err > self.threshold,
                    "recon_error": err,
                    "confidence": min(err / self.threshold, 2.0)   # 越高代表越“确信”是缺陷
                })
        return results

📌 Training Tips: Only use normal images during training, but it is best to leave a small part of normal samples to set the threshold (such as the 95th percentile), so that the misjudgment rate is controllable.


3. From experiment to production line: deployment and optimization

Making the code runable is only the first step. To actually get to the pipeline, speed, stability and maintainability must also be considered.

3.1 Practical tips to speed up deployment

  1. Quantitative Model: PyTorchtorch.quantizationThe model can be compressed from 32-bit floating point numbers to 8-bit integers, reducing the size by about 4 times and increasing the inference speed by 2 to 3 times, making it very suitable for edge devices.
  2. Reduce the input size: If the details of 224×224 are too "luxurious" for you to distinguish defects, you can try 128×128 or even 96×96, and the speed will be visible to the naked eye.
  3. Out of PyTorch runtime: Usetorch.onnx.exportExport the ONNX model and load it for inference using OpenCV DNN or ONNX Runtime. In this way, the deployment package only requires OpenCV, which is clean and tidy.

3.2 A simple industrial deployment skeleton

The following class demonstrates how to use the trained detector in actual scenarios: single frame detection, database recording, and real-time video stream processing are all available.

import cv2
import sqlite3
from datetime import datetime

class IndustrialDefectSystem:
    def __init__(self, detector, db_path="defect_results.db"):
        self.detector = detector
        self.db_path = db_path
        self._init_db()
    
    def _init_db(self):
        """创建存放检测结果的数据库"""
        with sqlite3.connect(self.db_path) as conn:
            conn.execute('''
                CREATE TABLE IF NOT EXISTS results (
                    id INTEGER PRIMARY KEY AUTOINCREMENT,
                    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                    is_defective BOOLEAN,
                    recon_error REAL,
                    confidence REAL
                )
            ''')
    
    def detect_single(self, img):
        """检测单张图片并持久化结果"""
        res = self.detector.predict([img])[0]
        with sqlite3.connect(self.db_path) as conn:
            conn.execute('''
                INSERT INTO results (is_defective, recon_error, confidence)
                VALUES (?, ?, ?)
            ''', (res["is_defective"], res["recon_error"], res["confidence"]))
        return res
    
    def detect_video(self, source=0, show=True):
        """实时视频流检测(source=0 表示默认摄像头)"""
        cap = cv2.VideoCapture(source)
        if not cap.isOpened():
            raise ValueError("无法打开视频源")
        
        while True:
            ret, frame = cap.read()
            if not ret:
                break
            res = self.detect_single(frame)
            # 可视化:缺陷框红色,正常框绿色
            color = (0,0,255) if res["is_defective"] else (0,255,0)
            text = f"DEFECT: {res['confidence']:.2f}" if res["is_defective"] else "OK"
            cv2.putText(frame, text, (30,30), cv2.FONT_HERSHEY_SIMPLEX, 1, color, 2)
            if show:
                cv2.imshow("Defect Detection", frame)
                if cv2.waitKey(1) & 0xFF == ord('q'):
                    break
        cap.release()
        cv2.destroyAllWindows()

4. Learning route and summary

Study suggestions

  • Go through the traditional solution first: Don’t rush into deep learning. Use the code of the isolated forest to see if your product images can be distinguished by simple features. It can help you quickly understand the essence of anomaly detection.
  • CAE is a commonly used ticket in industry: When traditional methods cannot hold up, convolutional autoencoders are usually the most stable and easiest starting point in deep learning solutions.
  • Data quality is more important than model fancyness: Collect normal samples with different lighting, different batches, and different angles as much as possible; a small number of defective samples are only used to help you verify the threshold and do not need to participate in training.
  • Take robustness as the first indicator: Be sure to test repeatedly with extreme pictures such as strong light, low light, partial occlusion, etc. before deployment. The production environment will not be polite to you.

Summarize

The core of industrial defect detection is actually just one sentence: Use the fewest defect samples to solve the most practical production problems. The traditional methods and convolutional autoencoders introduced in this article can already cover most common scenarios, and the implementation and deployment costs are relatively controllable. After mastering these basics, you will have the confidence to get started quickly, and it will not be too late to challenge more advanced methods such as VAE, GAN, and PatchCore.

In industrial deployment, **stability is far more important than "laboratory accuracy"**. Don’t be tempted by various SOTA models, give priority to solutions with fast reasoning speed, simple structure, and easy troubleshooting.