Tutorial on sliding verification code gap recognition based on deep learning
1. Introduction
Friends who do crawlers or automated testing must have been tortured by sliding verification codes such as Jiexian. Traditional edge detection and contour matching to find gaps either rely on fixed lighting and border styles, or you have to start over if you change the website slightly. **Why not use deep learning? ** The target detection model can automatically "see" the characteristics of the gap - no matter how complex the background is or how weird the shape of the gap, the generalization ability is much stronger than traditional methods.
This article uses the classic YOLOv3 to take you step by step through the complete gap identification process: from data collection and annotation to model training and deployment, and finally outputs a gap positioning API that can be directly called.
2. Preparation
2.1 Clone the code repository
This time, we directly use the YOLOv3 adapted version warehouse maintained by the open source community, eliminating the trouble of building it from scratch:
2.2 Install dependencies
in warehouserequirements.txtThe core dependencies have been configured, but in order to avoid network and version conflicts, it is recommended to complete the following two steps before installing all dependencies:
- Python environment: Use conda to create a 3.8-3.11 virtual environment (PyTorch 2.x is also compatible with this warehouse, but the PyTorch-related code in the training/detection script needs to be appropriately adjusted; novices are recommended to use 3.8 + PyTorch 1.8.x-1.13.x directly).
- Switch PyPI mirror: permanently change the source or temporarily add it
-i https://pypi.tuna.tsinghua.edu.cn/simple。
Official installation dependencies:
3. Basics of target detection: Why choose YOLOv3?
Sliding verification code gap recognition is essentially a single target detection task - finding the only "gap rectangular frame" in a picture.
Currently, the mainstream target detection algorithms are mainly divided into two categories:
- Two‑Stage: For example, the R‑CNN series first "casts a net" to extract thousands of candidate frames, and then classifies and corrects the positions one by one. The advantage is accuracy, but the speed is too slow and is not suitable for high-frequency requests in automation scenarios.
- One‑Stage: For example, YOLO and SSD directly divide the image into grids. Each grid predicts multiple bounding boxes and confidence levels at one time, and finally filters out the optimal results. The advantage is that it is as fast as flying. Although the accuracy is slightly lower, it is completely sufficient for scenes such as gap recognition.
This tutorial chooses YOLOv3 - it is like a "snake oil" in the YOLO series: much faster than v1/v2, much more accurate than v1, and has mature deployment and rich data.
4. Data preparation: the “fuel” of deep learning
4.1 Automatically collect verification code images
To find the gap, you must first have enough and complex verification code images. The warehouse provides a public demonstration site (https://captcha1.scrape.center/)的脚本`collect.py`:
💡 Tips: If you are targeting your own target website, please replace the CSS selector with the corresponding background image element. The larger the collection and the richer the diversity of gaps and backgrounds (different lighting, colors, gap shapes), the better the model's generalization ability.
4.2 Manually mark the gap position
YOLOv3 requires "annotated images" to train - that is, telling the model "which rectangular area in which image is the gap". It is recommended to use the lightweight open source labelImg tool:
The labeling steps must be strictly unified, otherwise problems may easily occur during the training process:
- After opening the tool, first switch the saving format to PascalVOC (save as XML) (we will convert it to YOLO format later, so that the steps will be clearer).
- Set default label: Add only one line in the tool's "Edit → Predefined Classes"
target(All notches use this label). - Open what you have collected
data/captcha/imagesdirectory, and set "Change Save Dir" to the newdata/captcha/annotationsTable of contents. - For each picture, use the "Create RectBox" tool to accurately select the four corners of the gap, and select the default label.
target, press "D" after saving to jump to the next one.
4.3 Convert to the format required by YOLOv3
The XML file of PascalVOC saves the upper left corner of the absolute coordinates(xmin, ymin)and lower right corner(xmax, ymax), and YOLOv3 requires:
- Category ID (there is only 1 category here, so the ID is 0)
- Normalized center point coordinates relative to the entire image
(x_center, y_center) - Normalized width and height relative to the entire image
(box_width, box_height)
provided by warehouseconvert.pyThe script can do all the conversions for you:
5. Model training
5.1 Download pre-trained weights
Training YOLOv3 from scratch requires millions of images and weeks. Fortunately, we can directly use the weights pre-trained on the COCO dataset (the model has learned to recognize the basic characteristics of common objects), and then use our own verification code data to do "fine-tuning". You can see good results in a few hours or even dozens of minutes.
The repository provides download scripts (Linux/Mac executionbash scripts/download_pretrained.sh, Windows users can download it directly from YOLOv3 官方权重下载页 and put it inweightsdirectory).
5.2 Start fine-tuning training
Before training, please confirmdata/captcha.yamlThe number of categories (nc), image path, label path, and category name are all correct (the default is category 1, and the path has been configured).
Then run the fine-tuning script directly:
💡 Training parameter description (available at
scripts/train.shMedium adjustment):
--img-size: Enter the size of the image (default is 640, if the video memory is not enough, you can change it to 416).--batch-size: The number of images used for each training (the larger the video memory, the larger this value can be set, the default is 8).--epochs: Number of training rounds (default 100, if the loss on the validation set no longer decreases, it can be stopped early).--data: Just confirmedcaptcha.yamlpath.--weights: Pre-trained weight path.
5.3 Use TensorBoard to view training results
During the training process, the script will automatically save indicators such as loss and accuracy tologsDirectory, it is very convenient to use TensorBoard to visualize:
Open browser to visithttp://localhost:6006, focusing on two curves:
- val/box_loss: Bounding box loss of the validation set (the lower the better, it is basically enough if it is reduced to about 0.05-0.1).
- val/mAP_0.5: The average accuracy of the verification set when IoU=0.5 (the higher the better, single target detection is stable if it can reach 0.9 or above).
6. Model testing
6.1 Prepare test data
Put the verification code images that did not participate in training (such as the last 20% of the images collected) intodata/captcha/testin the directory.
6.2 Run detection script
The optimal model saved during the training process will be placed incheckpoints/best.pt, directly use the detection script of the warehouse for testing:
The detection results (framed pictures, txt files containing notch coordinates) are automatically output todata/captcha/resultTable of contents.
7. Model deployment: Make a usable API
The detection script alone is not enough. In actual crawler or automated testing, we need an HTTP API that can receive images and return the x coordinate of the upper left corner of the gap (the sliding distance mainly depends on x). Here we use lightweight FastAPI to quickly implement:
Start the API (using uvicorn, the ASGI server officially recommended by FastAPI):
After startup, visithttp://localhost:8000/docsYou can open the interactive document that comes with FastAPI and directly upload images to test the interface!
8. Optimization suggestions
If the model is not performing as expected on your target website, you can try the following optimization directions:
- Data Augmentation: In
data/captcha.yamlEnable or add data enhancement (such as random flipping, cropping, brightness/contrast adjustment), or write your own script to generate more diverse samples. - Switch to more advanced YOLO versions: For example, YOLOv5, YOLOv8, etc. Their accuracy and speed are far superior to YOLOv3, and the warehouse architecture is more friendly and easier to deploy.
- Deployment Acceleration: Convert the model to ONNX or TensorRT format, and the inference speed can be increased by several to dozens of times.
- Active learning: Find samples with low model prediction confidence (for example, conf_thres is between 0.5-0.7), manually label them and add them to the training set for retraining. The model will become smarter as it is used.
9. Summary
This article uses YOLOv3 to run through the complete process of sliding verification code gap identification: from data collection and annotation to model training and deployment, and finally outputs an API that can be directly called. Compared with traditional image processing methods, deep learning solutions have much stronger generalization capabilities. When changing a website, you only need to re-collect/label a small amount of data and fine-tune it to quickly adapt.
The complete code can be directly viewed in the original warehouse: DeepLearningSlideCaptcha2

