title: OpenCV Practical Guide: A complete tutorial from image processing to deep learning - Computer Vision Core Technology | Daoman PythonAI description: A complete OpenCV practical tutorial, covering image processing, computer vision, deep learning DNN module, face detection, target tracking and other core technologies, including a wealth of Python code examples and practical projects. keywords: [OpenCV, computer vision, image processing, Python, face recognition, target detection, deep learning, DNN, image filtering, edge detection]

OpenCV Practical Guide: From pixels to core applications of deep learning

Introduction

When it comes to computer vision, the first thing that many people think of is OpenCV (Open Source Computer Vision Library). This open source tool library has taken root in industry and academia for more than 20 years, providing a complete workflow from basic pixel processing to deep learning model inference. More importantly, it supports multiple languages such as Python, C++, and Java, allowing you to implement powerful visual functions with very little code.

This tutorial is not a quick overview, but only talks about high-frequency practical skills. We will start from environment-setup and use replicable examples to take you through image operations, classic filters, face detection and YOLO target detection one by one, so that you can quickly develop the ability to develop actual projects.

1. OpenCV basic overview

1.1 What is OpenCV?

OpenCV is a lightweight cross-platform vision library designed for real-time applications. It encapsulates a large number of mature image processing and computer vision algorithms, so you don’t have to reinvent the wheel and only need to focus on application logic. Whether you're running Python prototypes on your laptop or deploying C++ modules on embedded devices, OpenCV provides a consistent interface.

1.2 Core features and applications

The functions of OpenCV can be roughly divided into three levels:

Classic CV algorithm: Like Canny edge detection, morphological operations, contour analysis, etc., suitable for scenes with stable lighting.
Deep Neural Network Module (dnn): Supports loading models in ONNX, TensorFlow, Caffe and other formats for efficient inference on CPU/GPU without installing PyTorch or TensorFlow.
Hardware Acceleration: Double the performance of high-load computing through OpenCL or CUDA backend.

It is precisely because of this coverage from traditional to deep learning that OpenCV is widely used in many fields such as face recognition, autonomous driving perception, industrial defect detection, medical image analysis, and augmented reality.

2. Quick environment configuration

2.1 Installation selection

According to your usage scenario, just install the corresponding package, no need to worry:

Scene	Command
Standard Introduction (Basic Image Processing)	`pip install opencv-python`
Requires patented algorithm (SIFT/panoramic stitching)	`pip install opencv-contrib-python`
Headless server or Docker environment	`pip install opencv-python-headless`

headlessVersions that remove GUI functionality (e.g.cv2.imshow()), smaller in size and suitable for production environments.

2.2 Verify installation

After the installation is complete, open a Python terminal and enter the following code to confirm the version. If a version number appears, the environment is ready.

import cv2
print(f"✅ OpenCV版本: {cv2.__version__}")

3. Basic image operations: pixels, ROI and channels

In OpenCV, an image is essentially a NumPy multidimensional array (shape[高, 宽, 通道]). This means that we can directly use array slicing to complete operations such as cropping and channel separation, which is extremely fast.

The following code demonstrates the complete process of "load-process-display-save" and shows how to operate the region of interest (ROI) and modify a color channel individually.

import cv2
import sys

def core_image_ops(img_path):
    # 1. 载入并检查
    img = cv2.imread(img_path)
    if img is None:
        sys.exit("❌ 请检查文件路径")
    print(f"📐 图像形状(高,宽,通道): {img.shape}")

    # 2. 提取感兴趣区域(ROI)
    roi = img[50:200, 50:200]   # 注意：opencv 坐标顺序是 (y, x)

    # 3. 通道操作（默认 BGR 格式）
    img_red_zero = img.copy()
    img_red_zero[:, :, 2] = 0   # 将红色通道置零，图像偏青色

    # 显示并保存
    cv2.imshow("原图", img)
    cv2.imshow("红清零", img_red_zero)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    cv2.imwrite("red_zero.jpg", img_red_zero)

    return img, roi

# 调用示例（请替换为你的图片路径）
# img, roi = core_image_ops("test.jpg")

Key Point: The order of image channels read by OpenCV is BGR. This is due to historical reasons. Special attention needs to be paid to the conversion when collaborating with other libraries (such as matplotlib).

4. Core color space conversion

Color space conversion is the first step in many visual tasks because certain information is easier to separate or analyze in a specific space. The two most commonly used transformations are:

Gray: Compresses three channels into a single channel, greatly reducing the amount of data. It is the basis for face detection and feature extraction.
HSV space: Decouple "color" from "brightness" - H (hue) and S (saturation) are insensitive to lighting changes and are very suitable for color filtering.

def color_convert(img):
    # 转为灰度
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    # 转为 HSV
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

    cv2.imshow("原图(BGR)", img)
    cv2.imshow("灰度", gray)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

    return gray, hsv

5. Classic image processing (must learn)

No matter how powerful deep learning is, many practical scenarios still require traditional preprocessing methods to improve robustness. The following three tips can cover 80% of the binarization, edge extraction and noise cleaning needs.

5.1 Adaptive binarization (handling uneven lighting)

Ordinary global thresholding fails when facing shadow or highlight areas. Adaptive Threshold dynamically calculates the threshold based on the neighborhood of each pixel, which is very suitable for scenarios such as document scanning and license plate recognition.

def adaptive_threshold(gray):
    # 邻域大小 11，常数 C=2 用于微调
    thresh = cv2.adaptiveThreshold(
        gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
        cv2.THRESH_BINARY, 11, 2
    )
    cv2.imshow("自适应二值化", thresh)
    cv2.waitKey(0)
    return thresh

5.2 Canny edge detection (denoising + accurate positioning)

Canny is a classic edge detection algorithm that first suppresses noise through Gaussian blur, and then uses a double threshold strategy to find strong edges. Its output is a single-channel black and white edge map, which is often used in the pre-step of contour discovery and target segmentation.

def canny_edge(gray):
    # 高斯模糊去噪
    blurred = cv2.GaussianBlur(gray, (5,5), 0)
    # 低阈值 50，高阈值 150
    edges = cv2.Canny(blurred, 50, 150)
    cv2.imshow("Canny边缘", edges)
    cv2.waitKey(0)
    return edges

5.3 Morphological opening and closing operations (cleaning binary images)

For binary images, "opening operation" can remove isolated small white points, and "closing operation" can fill internal small black holes. With the customized structure core, the foreground area can be accurately cleaned.

import numpy as np

def morph_ops(thresh):
    kernel = np.ones((3,3), np.uint8)   # 3x3 矩形核

    # 开运算：先腐蚀后膨胀，消除外部噪点
    opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
    # 闭运算：先膨胀后腐蚀，填充内部空洞
    closing = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)

    return opening, closing

6. Classic practice: face detection

For devices with limited computing power (such as Raspberry Pi, access control cameras), Haar cascade classifier is still the lightest choice so far. OpenCV has built-in ready-made training models, which can achieve real-time detection with only a few dozen lines of code.

def realtime_face_detect():
    # 加载 Haar 人脸检测模型
    face_cascade = cv2.CascadeClassifier(
        cv2.data.haarcascades + "haarcascade_frontalface_default.xml"
    )
    cap = cv2.VideoCapture(0)   # 打开默认摄像头
    print("⚠️ 按 q 退出")

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        # 检测参数：每次缩小 10%，候选框最少出现 5 次，最小尺寸 30x30
        faces = face_cascade.detectMultiScale(
            gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30)
        )

        # 绘制边界框
        for (x, y, w, h) in faces:
            cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)

        cv2.imshow("实时人脸检测", frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()

# 启动人脸检测
# realtime_face_detect()

Although Haar is not as accurate as deep learning in complex scenarios, its low latency and GPU-free features make it still active in the embedded field.

7. Modern practice: DNN YOLO target detection

OpenCVdnnThe module is a change in thinking - it does not train the model, it only does efficient inference. Therefore, you can directly import models in Caffe, TensorFlow, ONNX and other formats, completely saying goodbye to the pain of environment configuration. Here we take the lightweight YOLOv3-tiny as an example to implement image target detection.

7.1 Preparation

Please download the following three files in advance and place them in the same directory as the script:

7.2 Implementation code

The whole process is divided into: model loading → preprocessing (blob) → forward calculation → non-maximum suppression (NMS) → drawing results.

def yolo_detect(img_path, weights, cfg, coco):
    # 1. 加载 YOLO 模型与类别名
    net = cv2.dnn.readNet(weights, cfg)
    with open(coco, 'r') as f:
        classes = [line.strip() for line in f.readlines()]
    colors = np.random.uniform(0, 255, (len(classes), 3))

    # 2. 确定输出层（兼容新旧 OpenCV 版本）
    layer_names = net.getLayerNames()
    try:
        output_layers = [layer_names[i-1] for i in net.getUnconnectedOutLayers()]
    except:
        output_layers = [layer_names[i] for i in net.getUnconnectedOutLayers()]

    # 3. 图片预处理：缩放、归一化、通道转换
    img = cv2.imread(img_path)
    h, w = img.shape[:2]
    blob = cv2.dnn.blobFromImage(
        img, 1/255.0, (416, 416), swapRB=True, crop=False
    )
    net.setInput(blob)
    outputs = net.forward(output_layers)

    # 4. 解析检测结果 + NMS 去重
    boxes, confs, class_ids = [], [], []
    for out in outputs:
        for det in out:
            scores = det[5:]
            cls_id = np.argmax(scores)
            conf = scores[cls_id]
            if conf > 0.5:
                # 将相对坐标转换为绝对坐标
                cx, cy, bw, bh = (det[:4] * [w, h, w, h]).astype('int')
                x = int(cx - bw / 2)
                y = int(cy - bh / 2)
                boxes.append([x, y, int(bw), int(bh)])
                confs.append(float(conf))
                class_ids.append(cls_id)

    idxs = cv2.dnn.NMSBoxes(boxes, confs, 0.5, 0.4)

    # 5. 绘制结果
    if len(idxs) > 0:
        for i in idxs.flatten():
            x, y, w_box, h_box = boxes[i]
            label = f"{classes[class_ids[i]]} {confs[i]:.2f}"
            color = colors[class_ids[i]]
            cv2.rectangle(img, (x, y), (x+w_box, y+h_box), color, 2)
            cv2.putText(img, label, (x, y-10),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

    cv2.imshow("YOLO检测", img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

# 使用示例（请先将模型文件放在同目录）
# yolo_detect("test.jpg", "yolov3-tiny.weights", "yolov3-tiny.cfg", "coco.names")

**Why choose YOLOv3-tiny? ** Because it strikes an excellent balance between accuracy and speed, it can run in real time even on laptops without a GPU. If you need higher accuracy, just change to the weight file of YOLOv4 or YOLOv7, and the inference process is exactly the same.

8. Learning summary and suggestions

8.1 Core Harvest

Through the above practical exercises, you should have mastered:

Pixel Operation: With NumPy slicing as the core, efficient processing of ROI and channels.
Classic CV: Adaptive binarization, Canny and morphological operations, solving most preprocessing needs.
Modern CV:dnnModules make model deployment extremely simple, without the need to learn additional inference frameworks.

8.2 Learning Suggestions

Practice the basics first, then touch DNN: Traditional methods are the cornerstone of understanding visual problems, don’t skip them.
More hands-on projects: Small projects such as answer card recognition, fruit sorting, parking space detection, etc. can quickly improve proficiency.
Make good use of official resources: OpenCV 官方文档 contains a large number of tutorials and API instructions. If you encounter problems, check the documentation first.

OpenCV uses **BGR** color order by default, while libraries such as Matplotlib and PIL use **RGB**. When you use Matplotlib to display OpenCV images, be sure to first execute`cv2.cvtColor(img, cv2.COLOR_BGR2RGB)`Do the conversion or the colors will be reversed!

#OpenCV Practical Guide: From pixels to core applications of deep learning

#Introduction

#1. OpenCV basic overview

#1.1 What is OpenCV?

#1.2 Core features and applications

#2. Quick environment configuration

#2.1 Installation selection

#2.2 Verify installation

#3. Basic image operations: pixels, ROI and channels

#4. Core color space conversion

#5. Classic image processing (must learn)

#5.1 Adaptive binarization (handling uneven lighting)

#5.2 Canny edge detection (denoising + accurate positioning)

#5.3 Morphological opening and closing operations (cleaning binary images)

#6. Classic practice: face detection

#7. Modern practice: DNN YOLO target detection

#7.1 Preparation

#7.2 Implementation code

#8. Learning summary and suggestions

#8.1 Core Harvest

#8.2 Learning Suggestions

#Related tutorials