Use CRNN + CTC to get the text verification code! Attached is PyTorch training + ONNX accelerated deployment

📌 Non-malicious automation tips: This article and supporting tools are only for use in technical learning exchanges/compliance automation testing in non-sensitive scenarios. Please comply with the "Network Security Law" and the robots agreement/terms of use of the target website.

📂 Project address: VerificationCodeRecognition


1. Core background review

Text verification codes usually have the characteristics of character adhesion, tilt, and no fixed dividing line. The traditional "segment first and then classify" solution (such as YOLO cutting + single character recognition) is easy to overturn in this kind of scenario - once the segmentation is deviated, subsequent recognition will be completely wrong.

In contrast, CRNN + CTC Loss is a segmentation-free, end-to-end classic solution that is particularly good at handling serialized visual recognition tasks. The three core components have a clear division of labor:

  • 🧠 CNN (Convolutional Neural Network): "flatten and compress" the input image, extract local visual features such as strokes and textures, and generate highly abstract feature maps.
  • 🔄 Bidirectional LSTM (Recurrent Neural Network): Model the contextual dependence between characters on the feature sequence to avoid illogical combinations like "a-q-q-h".
  • 📏 CTC Loss: It eliminates the need to manually mark the position of each character, automatically aligns and calculates the loss directly based on the predicted sequence and label sequence, perfectly solving the training problem of indefinite length sequences.

2. Quickly set up the development environment

Basic requirements

Don’t use an older version, this project uses new features:

  • Python: 3.6+
  • Deep Learning Framework: PyTorch 2.0+ (it is recommended to install the CUDA version, the training speed is greatly improved)
  • Inference acceleration: ONNX Runtime (both CPU/GPU available)

One-click installation of dependencies

# 克隆项目到本地
git clone https://github.com/MgArcher/VerificationCodeRecognition.git
cd VerificationCodeRecognition

# 安装所有依赖(若需要 GPU 版 PyTorch 请提前手动安装)
pip install -r requirements.txt

3. Three steps to prepare the data set

Verification code recognition is a standard supervised learning task. Data quality directly determines the upper limit of the recognition rate.

3.1 Get data

The repository author provides a set of preset 4-digit plain English lowercase verification codes (for exampleaqqh_157845.jpg), can be used directly for practice: 👉 蓝奏云预设数据集

If you want to target your own business scenario (such as numbers + capital letters + Chinese mixed), you need to generate or collect compliant desensitized data yourself.

3.2 Data storage and naming convention

The image storage path and naming rules must be consistent, otherwise the reading logic will go wrong.

  1. Put all pictures in the root directorydatafolder.
  2. The file naming format is:真实标签_唯一标识.jpg/png(The unique identifier can be a timestamp, serial number, etc. to avoid overwriting with duplicate names).

Example of correct directory structure:

data/
├─ aqqh_157845.jpg
├─ bmwx_20240520.jpg
└─ zyzh_9999.png

3.3 Customized reading logic (optional)

If your image format, path or naming method is completely different, you can modify ittool/dataloader.pyinMyDatasetClass to adapt your own data loading process.


4. Model training guide

During the training process, it will be automatically recorded: training set loss (total_loss), training set accuracy (acc), as well as the loss and accuracy of the validation set (if validation is turned on).

4.1 Modify core configuration

Opentrain.pybeginningOptclass, adjust the following parameters as needed:

class Opt():
    trainRoot = r"data"         # 训练集路径,Windows 下记得加 r
    cuda = True                # 有 NVIDIA 显卡就打开,训练从几十分钟缩短到几分钟
    pretrained = ''            # 断点续训或微调时填写 pth 模型路径
    alphabet_path = 'tool/charactes_keys.txt' # 字符字典,更换场景时务必修改

4.2 Start training

Just run it directly:

python train.py

4.3 When to stop training

Don’t blindly pursue infinite iterations. You can stop when the following indicators stabilize:

  • training setaccClose to 100% (the model has fully fitted the training data)
  • Validation setval_accStable above 95% (for the pure letter preset data set), and no longer fluctuates significantly to avoid overfitting

The trained model weights will be saved in the root directoryexprfolder.


5. Inference and production-level deployment

A. Native PyTorch reasoning (suitable for development and debugging)

Suitable for quick verification of results during training without additional format conversion:

python var_torch.py

Time-consuming reference: About 30-35ms/picture in a pure CPU environment, about 3-5ms/picture using an NVIDIA graphics card (such as RTX 2060).

B. ONNX accelerated inference (preferred for production environments)

The inference speed of native PyTorch on CPU is slower, but after exporting to ONNX format, it can be accelerated by 3-4 times with ONNX Runtime.

1. Export ONNX model

python export.py

After the export is successful, inexprwill be generated in the directorycrnn.onnxdocument.

2. Inference using ONNX

python var_onnx.py

Time-consuming reference: The pure CPU environment is about 7-10ms/image, which is almost comparable to the inference speed of mid- to low-end graphics cards.


6. Advanced optimization tips

6.1 Character dictionary adaptation

If your verification code contains numbers, capital letters, special symbols or even Chinese, be sure to update it as soon as possibletool/charactes_keys.txt. Write all possible characters in a fixed order without duplication.

6.2 Data enhancement breaks through bottlenecks

When the recognition rate is stuck at around 90% and is difficult to improve, trytool/dataloader.pyofMyDatasetin class__getitem__Add the following enhancement operations to the method:

  • Random rotation
  • Add Gaussian noise or salt and pepper noise
  • Randomly adjust contrast and brightness
  • Slight affine transformations (translation, scaling, tilting)

6.3 Pre-training fine-tuning saves time and effort

If training from scratch converges too slowly, you can first find a general text recognition pre-training model (such as a CRNN model that only recognizes English), and then use your own business data to fine-tune it, and the convergence speed will be significantly accelerated.


Done! By following this process, you will have your own text verification code recognition gadget. If you encounter problems during use, you are welcome to file an issue in the project's GitHub repository~