title: Detailed explanation of Siamese Network: similarity learning and face recognition | Daoman PythonAI description: In-depth analysis of the Siamese Network model and its application in tasks such as similarity learning, face recognition, signature verification, etc., including detailed architecture analysis, PyTorch implementation, and practical application scenarios. keywords: [Twin Network, Siamese Network, Similarity Learning, One-Shot Learning, Deep Learning, Computer Vision, PyTorch]
Detailed explanation of Siamese Network: similarity learning and face recognition
Introduction
In traditional deep learning classification scenarios, we need a large amount of labeled data of fixed categories to make the model converge. But in reality, we often face challenges:
- Does the company have to retrain the face recognition model when adding new employees?
- There is only one authentic sample of antique calligraphy and painting in the museum’s appraisal?
- Need to match "niche shoes with similar styles" in the e-commerce search?
Siamese Network jumps out of the logic of "direct classification" and instead learns the distance/similarity between samples, perfectly adapting to this type of problem. This article will unfold one by one from the core principles, PyTorch minimalist implementation, key components to practical techniques.
1. Quick overview of core principles
1.1 The nature of twins: two “identical” sub-networks
The twin network consists of two sub-networks with exactly the same structure and 100% shared parameters, just like twins.
1.2 Process dismantling (understand in seconds by looking at the picture)
1.3 Why use shared weights?
- Parameters halved: higher training efficiency
- Feature Space Consistency: Ensure
f(X1)andf(X2)Comparable in the same coordinate system - Strong generalization: Avoid two sub-networks learning different feature logic
2. PyTorch minimalist implementation
First write a basic version that can run directly on grayscale MNIST, focusing on logic rather than complex architecture.
2.1 Basic twin network
2.2 Key supporting components
Contrastive Loss
The core loss of the twin network makes the distance between similar samples smaller and the distance between heterogeneous samples larger (there is no loss if it exceeds margin):
The core idea is simple:
- When two samples are similar (label
label=0), directly penalize their feature distance, forcing the distance to approach 0. - When two samples are heterogeneous (label
label=1), only punish those whose distance is smaller thanmarginsituation. In other words, as long as the distance between heterogeneous samples is large enough (more thanmargin), no more penalties will be imposed, and the model can be "lazy" and ignore them.
This part of the logic can be clearly expressed in code:
Fast similarity inference
3. Practical core: sample pair construction and data enhancement
3.1 Sample pair construction (key to training)
The input to the twin network is a sample pair, not a single sample. Need to balance positive and negative sample pairs (usually 1:1):
3.2 Notes on data enhancement
- You can apply different small amplitude enhancement (such as random brightness) to two images of the same positive sample pair**
- Don't apply large enhancements that break feature consistency (such as handwritten digits rotated more than 90 degrees)
4. Common practical problems and optimization
4.1 How to choose the threshold?
Don't slap your head! Use the ROC curve of the validation set to find the optimal threshold:
4.2 What should I do if the reasoning is too slow?
- Precomputed database features: Save registered faces/signatures/product features to the database/cache without re-extracting them every time
- Model Quantization: Use
torch.quantizationCompress the model from FP32 to INT8, increasing the speed by 3-4 times - ONNX/TensorRT Export: Optimized with dedicated inference engine when deployed to production environment
5. Typical application scenarios
The core of the twin network is "small sample + similarity judgment". Typical scenarios include:
- Face recognition/attendance: New employees only need to take 1-3 photos, no retraining is required
- Signature/Fingerprint Verification: There are very few authentic samples, and only similarity is compared during verification.
- E-commerce same/similar style search: Use images uploaded by users to match products with similar styles in the library
- Defect Detection: There are only a small number of normal samples. During detection, the distance between the new sample and the normal sample is compared.
6. Summary
Twin network is a simple but powerful similarity learning architecture, which perfectly solves the pain points of traditional classification in the "small sample, new category" scenario.
Review of core points:
- Two identical subnetworks: shared parameters and consistent feature spaces
- Contrast loss: compress the distance between similar types and widen the distance between different types.
- Sample pair training: Balancing positive and negative samples, the key among the keys
If you need higher accuracy, you can advance to learn Triplet Loss, FaceNet or Transformer-based similarity model.
Related reading
- 计算机视觉基础
- 经典CNN架构剖析
- Learning a Similarity Metric Discriminatively (original paper on twin networks)

